Project Ideas
This page provides an overview of a few projects that seem doable over the course of approximately two months of thinking and programming, for someone who isn't yet acquainted with our code base. The list might seem long and daunting, the hope is that any single one of the projects would prove to be an interesting, but solvable, challenge. Moreover, we expect the projects to lead to tangible results quickly (after a couple days or two weeks of hacking at most), and that the goals could be extended in challenging and exciting directions as more milestones will be reached.
Table of Contents
- 1 Improvements to the library
- 1.1 Unicode Integration
- 1.2 Exploiting GMP for bignum and rational arithmetic
- 1.3 Flexible (pseudo) random number generation
- 1.4 Vectorised sequence operations
- 1.5 Stronger hash functions and specialised hash tables
- 1.6 Efficient interpretation
- 1.7 Improving the thread-safety of the object system
- 1.8 Threading/locking debugging facilities
- 2 High-level optimisation
- 3 Middle end infrastructure
- 4 Back end work
- 5 Diving in the runtime system
- 6 Projects that may be further described if there is interest
- 6.1 Free-er form displaced arrays
- 6.2 Autodxification of higher-order functions
- 6.3 Better failure modes for heap overflow
- 6.4 Inline caches for PCL
- 6.5 clang(++)-based FFI
- 6.6 MPFR for long floats
- 6.7 Structured codewalking
- 6.8 Hygienic macros
- 6.9 Precompilation support for CLOS
- 6.10 CLOS sealing
- 6.11 GCed special variables
- 6.12 Precise stack scanning
- 6.13 A binary serialisation library
- 6.14 Contracts
- 6.15 Core relocation
- 6.16 Reducing the size of delivered applications
- 6.17 Unboxed calling convention
- 6.18 Allocation pools
- 6.19 Shared memory multi-process heap
- 6.20 Faster FASL loading
- 6.21 Scheduling pass
- 6.22 utf-8b
- 6.23 Parametric recursive types
1 Improvements to the library
This section contains projects for which the work lies mainly in improving the library of functions that are offered to users of SBCL.
1.1 Unicode Integration
SBCL has a simple-minded implementation of Unicode support: each Unicode codepoint is a single Lisp character, and strings are sequences of codepoints. This implementation meets the principle of least surprise, but has left some rough edges; the project here involves exploring some of those rough edges and smoothing them down. The project involves: implementation and testing (from test vectors and differentially) of a number of Unicode algorithms, and (the interesting bit) integrating those algorithms in a tasteful way in the language runtime: for example, by using the normalization algorithm in symbol equality, allowing easy use of collation keys for sorting, or syllable breaking in the pretty-printer or formatter.
1.1.1 Deliverables
- CL implementations of algorithms specified in the Unicode standards;
- Integration of these algorithms into the the SBCL language runtime;
- Integration of test cases into the SBCL build sequence;
- Exploratory work in using these algorithms in suitable places in the Lisp language runtime.
1.1.2 Prerequisites
There are no hard prerequisites for this project; an interest in internationalization issues would be helpful, as would prior experience with working with Unicode.
1.1.3 Student learning
A student successfully executing this project would learn about: data structures for efficient information storage; test methodologies; and working within the strictures of standardized algorithms and languages.
1.2 Exploiting GMP for bignum and rational arithmetic
SBCL has a reasonably simple but decent implementation of arbitrary precision integers and rationals; its design is also informed by the Common Lisp specification, in particular support for bitwise operation in two's complement. GNU MP (GMP) offers sophisticated routines that would be extremely useful when working with medium and large bignums or rationals, but calls to GMP entails conversion to/from sign-magnitude representation, and foreign function interface overhead. The project requires the integration of GMP in SBCL, ideally as an optional module ("contrib"), the development of efficient conversion algorithms between GMP's and SBCL's concrete representations, and the determination of cross-over points between SBCL's and GMP's routines. This work would probably be appreciated by symbolic computing applications (e.g. maxima).
1.2.1 Deliverables
- (partial) Foreign function interface definitions for GMP;
- Development of routines to convert between two's complement and sign-magnitude representation;
- Integration in SBCL's existing bignum and rational arithmetic code;
- Determination of criteria for the use of GMP's routines to improve performance on large integer or rationals, as used in applications, without overly penalising the common case of tiny values;
- Development of unit tests for the new routines;
- Improvement of SBCL's algorithms for small or medium bignums and rationals (optional).
1.2.2 Prerequisites
Familiarity with Common Lisp or C. An interest in number theory would be useful, and may provide additional motivation.
1.2.3 Student learning
The student would learn to characterise the use of a small part of a system (bignum and rational arithmetic) by various applications, and develop microbenchmarks accordingly, in order to make non-obvious technical decisions. They would also exercise test methodologies, and potentially implement algorithms described by specialised scientific literature.
1.2.4 Preliminary work, notes, etc.
Code by Waldek Hebisch http://comments.gmane.org/gmane.lisp.steel-bank.general/3273 http://www.math.uni.wroc.pl/~hebisch/prog/
It seems like using GMP can help FriCAS a lot https://groups.google.com/forum/#!msg/fricas-devel/yby-B4QS4b4/3VFvYRomWQAJ
Pi digits on the benchmarks game is currently a bunch of calls to GMP. It would be nice to have comparable performance with portable CL. http://benchmarksgame.alioth.debian.org/u64q/program.php?test=pidigits&lang=sbcl&id=1
1.3 Flexible (pseudo) random number generation
SBCL's support for random number generation is limited to an implementation of the Mersenne Twister (MT19937) PRNG. This is a good general-purpose PRNG, but some applications require different characteristics of the random numbers, whether that is specific guarantees about correlations, suitability for cryptographic algorithms, or just plain raw speed. This project involves implementing several PRNG algorithms, and integrating them into SBCL in such a way as to allow library authors and end-users to programmatically negotiate the choice of PRNG algorithm among multiple implemented choices.
1.3.1 Deliverables
- Implementations of several classes of random number generators, including one statistically-robust PRNG, geared toward demanding mathematical applications, and one believed suitable for cryptographic applications;
- Use of statistical tests to examine the properties of the implemented PRNGs;
- Pluggable integration into SBCL's existing random number generation code (using random-state objects);
- (optional) support for hardware random number generators;
- (optional) development of a protocol to allow random number generators to be selected dynamically given algorithmic requirements.
1.3.2 Prerequisites
No strict prerequisites, although some understanding of the possible space of pseudorandom number generation, including tradeoffs regarding speed, predictability (forwards and backwards) and dimensional distribution. Familiarity with CL and its approach to random number generation can be acquired while doing the project.
1.3.3 Student learning
The student will learn the breadth of possible implementations of random number generators, and their limitations, all while working in the context of an established language runtime. A successful project is also likely to cover statistical tests of randomness and efficient object-oriented design.
1.4 Vectorised sequence operations
SBCL is finally gaining support for x86-64 short vector SIMD instructions (SSE), at the source level. Many array and sequence processing functions would benefit from manual vectorisation. If successful, this project would require the identification of manual vectorisation targets, the development of efficient routines, and the addition of code and of transformations in the compiler to exploit these routines transparently.
1.4.1 Deliverables
- A list of promising vectorisable operations;
- The implementation of a few special cases for such vectorisable operations;
- A generic method to dispatch to vectorised routines depending on the CPU's capabilities;
- (optional) Convenience macros and functions to implement complete vectorised operations, including misaligned data;
- (optional) SIMD-within-a-register (SWAR) implementations for some vectoriable routines.
1.4.2 Prerequisites
Understanding of the bit-level representation of data in computers. Familiarity with SSE instructions is expected to improve with time.
1.4.3 Student learning
The student will learn to detect opportunities for vectorisation in pre-existing code. They will also hone their ability to adapt algorithms to vector processing, and develop methods to determine when and how operations should be vectorised.
1.5 Stronger hash functions and specialised hash tables
As mandated by the standard, SBCL offers hash tables and exposes pre-defined hash functions. Interactions with other parts of the system make it challenging to improve general-purpose hash tables. However, type-specialised hash tables seem approachable, as does improving the quality of the standard hash functions. If completed, this project would lead to the implementation of better hash functions, standard or as extensions, and to the development of specialised associative dictionaries.
1.5.1 Deliverables
- Adapt and implement known families of hash functions for integer and string data;
- Improve the distribution of standard hash functions;
- Implement routines to test the quality and correctness of hash functions;
- (optional) Provide parameterised hash functions that are resistant to collision attacks, as extensions to the standard;
- (optional) Implement specialised associative dictionaries.
1.5.2 Prerequisite
Familiarity with probabilities and with basic data structures. An interest in computer microarchitecture will be useful.
1.5.3 Student learning
The student would exploit or implement state of the art hash functions, compare their strengths and weaknesses, and integrate them in a complete system. They would also apply their understanding of the low-level representation of data to develop specialised data structures that remain compatible with a specification.
1.6 Efficient interpretation
SBCL inherited from CMUCL a complicated compilation process geared toward the generation of efficient machine code. A simple interpreter enables SBCL to avoid this overhead in contexts when execution speed is not an issue at all. Many programs would benefit from more efficient interpretation, an intermediate choice between the current extremes of a full type-propagating compiler and an s-expression interpreter. Completion of this project entails the development of a standard-compliant minimal compiler.
1.6.1 Deliverables
- Disentangle or replicate the front-end to perform minimal compilation;
- Define an intermediate representation appropriate for the efficient interpretation of Common Lisp code;
- Develop a compiler from minimally-compiled source to the intermediate representation.
1.6.2 Prerequisite
Familiarity with Lisp macros and with first-class functions.
1.6.3 Student learning
The student would improve their ability to read, understand, and implement an ANSI standard. They would then explore the performance of a few standard or novel interpretation schemes on contemporary computers. Finally, they would develop a small compiler for a practical programming language.
1.6.4 Preliminary work, notes, etc.
"Using closures for code generation," Feeley M and Lapalme G (http://www.iro.umontreal.ca/~feeley/papers/FeeleyLapalmeCL87.pdf) nicely presents a technique that's very useful to compile languages without writing a full native code compiler. That's how CL-PPCRE compiles regular expressions.
1.7 Improving the thread-safety of the object system
Developers continually attempt to reduce SBCL's reliance on a single "world" lock. This effort has lead to the elimination of some re-entrancy issues, and to the introduction of a few concurrency bugs, many of them related to the Common Lisp Object System (CLOS). Some bugs are easily triggered, other depend on rare race conditions. Some could be fixed with additional lockings, others through the use of mostly non-blocking synchronisation techniques, and others still by outright modifying the behaviour of the object system. This project requires the student to organise known or potential concurrency bugs in PCL, SBCL's implementation of CLOS, and attempt to fix them.
1.7.1 Deliverables
- Collect, organise, and diagnose concurrency bugs in PCL;
- Fix some of these bugs;
- Develop a methodology to trigger hard-to-detect concurrency bugs in PCL;
- Suggest generally-applicable techniques to eliminate such bugs from PCL and the runtime system (optional).
1.7.2 Prerequisites
Basic understanding of shared-memory concurrency. A conservative understanding of specific memory models will be developed in parallel with the work.
1.7.3 Student learning
The student will encounter concurrency bugs in a complex stateful system that exploits both lock-based concurrency control and lock-free operations. They will develop an understanding for the issues encountered in such systems, and learn how to detect and then solve them.
1.8 Threading/locking debugging facilities
SBCL exposes low-level shared memory concurrency and parallelism constructs: threads, locks, atomic operations, etc. Such constructs are easily misused, and other projects and languages implementations have developed tools to detect and understand concurrency bugs (both in terms of correctness and of performance). For example, Linux's locking correctness validator (lockdep) seems well suited to Common Lisp. Other proven tools would no doubt benefit to SBCL and its users. A successful project would identify existing (or create) tools that are promising to assist in the development of threaded applications with SBCL, and implement and document some of these tools.
1.8.1 Deliverables
- Survey of proven threading/locking debugging tools;
- Implementation and integration of at least one such tool in SBCL;
- Documentation for these extensions;
- (optional) Suggestion of alternative less error-prone constructs for sb-concurrency or sb-thread.
1.8.2 Prerequisites
Familiarity with the POSIX threads programming model. An awareness of the execution process for multi-socket systems with multiple levels of cache would be useful, but not necessary.
1.8.3 Student learning
The student would become intimately familiar with the sort of bugs commonly encountered in threaded system. They would also study state of the art tools to detect such bugs, and replicate some of them for integration in a pre-existing environment. They would also exercise their technical writing skills to document the tools and describe their correct use. Finally, they would demonstrate the ability to propose elegant architectural solutions to complex issues.
2 High-level optimisation
These project are mostly concerned with extending the initial (mostly target-independent) optimisation phase.
2.1 Efficient integer truncate/floor/ceiling by constants
Integer division is notorious for being slow. However, it is also known that the divisor is constant in the vast majority of cases, and serious compilers exploit that fact to simplify divisions into sequences of simpler multiplications, shifts, and additions. SBCL implements such a simplification only for truncated division of unsigned machine words. Floor and ceiling are less commonly supported natively in programming languages, and there is a dearth of literature describing how to simplify them. However, it is possible to do so, for both signed and unsigned machine integers. It is also possible to specialise the routines for tagged arithmetic. A complete execution of this project would include the development of simplification routines for signed and unsigned truncate, floor and ceiling divisions by integer constants. Some of the simplifications, particularly those concerning tagged integers, will be widely applicable and likely novel.
2.1.1 Deliverables
- Implement strength reduction of signed truncated division;
- Determine how to correctly simplify floor and ceiling division;
- Implement strength reduction of floor and ceiling division;
- Adapt the algorithms to take tagging into account;
- Extend the test suite for integer division by constants;
- (optional) Extend the work to constant division by rationals.
2.1.2 Prerequisites
Basic number theory. Some work will likely be at the assembly level, but what little is necessary can be acquired on the fly.
2.1.3 Student learning
The student would apply pure mathematics concepts from number theory to understand how to correctly simplify operations in computer programs. They would likely become acquainted with the performance characteristics of contemporary computers to decide how to let number theory guide the simplification of divisions by integers. They would also show the correctness of simple but novel variations, and exploit their understanding of the problem domain to develop tests that are likely to detect incorrect transformations.
2.1.4 Preliminary work, notes, etc.
"Integer division using reciprocals," R Alverson (http://www.acsel-lab.com/arithmetic/papers/ARITH10/ARITH10_Alverson.pdf)
"Division by invariant integers using multiplication," T Granlund and PL Montgomery (http://gmplib.org/~tege/divcnst-pldi94.pdf)
"N-Bit unsigned division via N-Bit multiply-add," AD Robison (http://www.computer.org/csdl/proceedings/arith/2005/2366/00/23660131-abs.html)
"Labor of Division (Episode III): Faster Unsigned Division by Constants," ridiculous_fish (http://www.ridiculousfish.com/blog/posts/labor-of-division-episode-iii.html)
Specifically on extension to floor/ceiling http://discontinuity.info/~pkhuong/misc/div-by-mul.pdf
Some of the references above were first found via http://www.mail-archive.com/open64-devel@lists.sourceforge.net/msg01139.html. It seems like work in this area tends to be duplicated… A review might not hurt.
Hacker's Delight might be more confusing than anything else, YMMV (also, see http://www.hackersdelight.org/divcMore.pdf).
2.2 Exploiting switch/case in standard control structures
While Common Lisp exposes case constructs, there is no standard support for constant-time computed goto or C-style switch/case flow control. A partial patch to extend SBCL with such an indirect-jump-based control construct exists. First, it must be completed and tested, and, second, the new construct should be exploited in a standard-compliant manner in the implementation of standard flow control macros. This project would greatly improve the performance of some state machines implementations, and benefit to a wide range of programs that use standard case constructs directly.
2.2.1 Deliverables
- Forward-port the indirect-jump patch;
- Expose the new operator in a standard-compliant manner;
- Create benchmarks to understand how to best use this new operator;
- Exploit the operator in the implementation of standard flow control macros;
- (optional) For the patch to additional computer architectures.
2.2.2 Prerequisites
Knowledge of x86[-64] assembly language. Familiarity with advanced compilation techniques is an advantage.
2.2.3 Student learning
The student would gain an overview of the complete pipeline in a production compiler, from the front-end, to dataflow analyses, to the generation of machine code. They would also have to work within an ANSI standard to expose a new feature to users. Finally, they would improve their understanding of the low-level performance of modern architectures, particularly at the level of branch prediction, in order to transparently improve the runtime efficiency of flow control constructs.
2.2.4 Preliminary work, notes, etc.
This dates from 2008, but it gives a decent idea of how little code is needed to get started http://discontinuity.info/~pkhuong/sbcl-switch-case.lisp.
3 Middle end infrastructure
These projects are concerned with the target-independent optimisation phase as well, but modifies its infrastructure, rather than using it to add more smartness to the compiler.
3.1 Accurate and correct numeric type derivation
The static derivation of intervals for the values taken by mathematical operations is essential for Common Lisp compilers to convert idiomatic programs to machine code comparable with that of less safe languages. SBCL's implementation is fairly complicated, and seems subtly incorrect for floating-point types, particularly when the rounding mode differs between the compilation and execution environments. Upon completion, this project would result in simpler and more robust routines to propagate numeric intervals for SBCL.
3.1.1 Deliverables
- A naïve but clearly correct interval derivation module;
- Development of test cases to trigger issues in numeric (floating-point) type derivation;
- More sophisticated interval derivations;
- (optional) Express the interval derivation logic in portable Common Lisp for the cross-compiler.
3.1.2 Prerequisites
Some real analysis. Minimal familiarity with numerical analysis and with the implementation of floating-point arithmetic in computers.
3.1.3 Student learning
The student will bridge between their understanding of mathematical operations with their concrete approximation in contemporary computers to safely characterise the behaviour of arbitrary Common Lisp expressions. They will thus acquire experience with simple numerical analysis, and become acquainted with the difference between theoretical mathematics and floating-point operations. They will also learn to exploit the concrete representation of floating-point values to implement simple and efficient, but correct numerical algorithms.
3.2 Expression optimisation
Although SBCL performs fairly sophisticated analyses, subsequent transformations are performed (except for a few exception) bottom-up, one function call at a time. A large body of classic techniques is available to optimise complete (side-effect-free) expression trees top-down. Upon completion, this project would extend SBCL with a system to define tree rewriting rules, and execute them on code. Arithmetic and bitwise operations would likely benefit, as would modular arithmetic and embedded domain-specific languages.
3.2.1 Deliverables
- Detection and visualisation of expression trees in the first intermediate representation (IR1);
- Insertion of a top-down rewriting pass in the IR1 optimisation loop;
- Development of a pattern and transform definition language for expression trees;
- Implementation of a few rewrites with this new infrastructure.
3.2.2 Prerequisite
Basic discrete mathematics. Familiarity with formal grammars and automata theory is optional.
3.2.3 Student learning
The student would improve their understanding of the compilation process for pure expressions, and review, then implement, classic techniques for their improvement. They would also develop a new internal library feature, and exploit it to show concrete improvements in the compiler's output.
3.3 Coarser type derivation system
Common Lisp's type system is extremely expressive, and it is expected that complicated types will lead to slow type tests and comparisons. SBCL exploits this expressiveness to implement fine-grained flow-sensitive analyses. In many cases, particularly when execution speed is a secondary concern, tracking types less precisely would improve compilation speed significantly, without overly affecting the object code. This project consists of first identifying points in SBCL's analysis passes where types could be profitably widened, of designing a coarse type lattice that accelerates compilation, and of implementing that coarse lattice in SBCL.
3.3.1 Deliverables
- Gather example code that exercises SBCL's type-based analyses;
- Determine what operations exhibit complexity blowups in these examples;
- Define a simple type widening operator, and insert it in key places;
- Experiment with various widening strategies and alternative type lattices to improve compilation speed while preserving correctness and reasonable execution efficiency.
3.3.2 Prerequisites
Proficiency in discrete mathematics, particularly set theory. The student is expected to become familiar with data flow analyses in the course of the project.
3.3.3 Student learning
The student will acquire experience at profiling a complex symbolic manipulation system. They will then apply their results to improve the practical performance of the system, while preserving its mathematical correctness. They will also strengthen their mastery of the static (data flow) analysis of impure languages.
3.4 Quick compilation
SBCL inherited from CMUCL a complicated compilation process geared toward the generation of efficient machine code. A simple interpreter enables SBCL to avoid this overhead in contexts when execution speed is not an issue at all. Many programs would benefit from more straightforward compilation, an intermediate choice between the current extremes of a full type-propagating compiler and an s-expression interpreter. If completed, this project would enable time-consuming phases of the compiler to be disabled or replaced with coarser, but more quickly-executed, ones. This would result in reduced compilation times for code that isn't performance-critical, and, potentially, the ability to compile very large computer-generated functions.
3.4.1 Deliverables
- Identify the most time-consuming phases of the compiler;
- Extend performance and correctness tests for the compiler;
- Determine how to disable or simplify time-consuming phases, while preserving correctness;
- (optional) Develop alternatives for a few recursive or super-linear computations in the compilation process.
3.4.2 Prerequisites
No strict prerequisite. An interest in compilation would be helpful, as would familiarity with the analysis and design of algorithms and data structures. The student will become comfortable with set and lattice theory.
3.4.3 Student learning
The student will gather representative code samples from actual users, and exploit them to build an understanding of the empiric performance of SBCL's compiler. They will also use this data to develop benchmarks that reflect real-world needs, and design tests to convince themselves and others that changes to a large system preserve its correctness. They will finally exploit these tools to determine which parts of the compiler should be disabled or simplified, and how to do so.
3.4.4 Preliminary work, notes, etc.
Short IRC transcript on the topic
06:53 < pkhuong> I believe you had some stuff on quicker compilation? Is there any preliminary branch, patch set or notes I should point people to? 07:01 <@nikodemus> pkhuong: not as such 07:02 <@nikodemus> i can write a quick braindump 07:02 <@nikodemus> IIRC. the biggest timesink is constraint propagation. 07:03 <@nikodemus> IIRC. the trickiest thing about eliminating it is that we have a bunch of places that assume that certain things are always optimized. 07:03 <@nikodemus> so if you eliminate it, you get "correct" code that the rest of the system cannot deal with 07:05 <@nikodemus> being a silly young thing, i was working on essentially adding IR0 which would have been a fast way to get those minimal things sorted out. I still think that's possible, but it probably isn't the best way to go about it... 07:07 <@nikodemus> i think it would be a reasonable approach to just diable it, and see where things break, and start fixing the places that make those assumptions 07:07 <@nikodemus> EOBD 07:07 <@nikodemus> s/diable/disable/ 07:09 < pkhuong> sounds right to me as well. 07:12 < pkhuong> So, I see where constraint propagation can be simplified... we need quick but conservative type union/intersection... csubtypep as well, I guess. 07:13 < pkhuong> Then, see what happens to the transforms, and play whack-a-mole. 07:13 <@nikodemus> IIRC a typical breaked is from eg (coerce x 'list) => (the list (if (listp x) x (convert-to-list x))) sort of transforms 07:14 < pkhuong> how does that break? 07:15 <@nikodemus> don't remember the details. maybe when X is known not to be a list due to a declaration? 07:16 <@nikodemus> so you end up with something like (the list (the (not list) x)) 07:16 <@nikodemus> which the compiler won't like 07:16 <@nikodemus> or maybe it was IR1-OPTIMIZE that was needed to kill the IF 07:17 < pkhuong> oh yeah. 07:17 < pkhuong> constraints need ir1 optimise to kill (if [x] [block] [same block]) into (progn [x] [block]) 07:19 <@nikodemus> and (if [always false] [will cause a runtime error] [ok])? 07:19 <@nikodemus> because some of the breakaged were WARNINGs due to compiler thinking it sees something that will break at runtime for sure 07:19 <@nikodemus> breakages, even 07:19 < pkhuong> I think that's all right, wrt codegen. I changed that to style warnings because of ppcre 07:20 <@nikodemus> oh 07:20 < pkhuong> If there's a branch on the path (well, it's a hack, but catches most cases), the warning is downgraded to a style warning. 07:21 <@nikodemus> right 07:21 <@nikodemus> then there are our always-translatable things, i guess 07:22 < pkhuong> shouldn't we get nearly all of those right just with the defknowned types? 07:22 <@nikodemus> and many many transforms which should not be done unless we're actually going to compile things as well as we can 07:22 <@nikodemus> most, i think 07:22 < pkhuong> there's that. workingness would be a good first step though (: 07:23 <@nikodemus> and the rest should probably not be always-translatable, but something like mostly-translatable :) 07:23 < pkhuong> right. 07:24 <@nikodemus> yeah, and eliminating transforms is the completely wrong first step, because /almost/ any problematic code a transform can inject can also come from a user 07:25 <@nikodemus> though i'm willing to give a pass to eliminating transforms that produce (sb-foo:%something-magic ...) if they turn out to be a problem 07:25 < pkhuong> yes and no... We could only run CP once. User code would benefit from minimal constraints, but deftransformed code not. 07:26 <@nikodemus> sure, but in deep CMUCL time CP used to be completely optional 07:26 <@nikodemus> i'd sort of like to restore that 07:26 < pkhuong> right. 07:26 <@nikodemus> oops. make that "would like to see that restored" :) 07:27 < pkhuong> Also, our transforms go out of their way to depend on CP: expand into a lambda without type declaration on the arguments? 07:28 <@nikodemus> the deftransform could magically add most of the type information there, i think 07:28 <@nikodemus> (a case for truly-type declaration?)
3.5 Precise type derivation
SBCL and CMUCL are recognized for the strength of their type-directed optimisations, which depends on the quality of the type propagation pass. That pass exhibits severe weaknesses in the presence of recursion or of assignments: such non-trivial bindings are initialised with a static type of T (barring any user-provided declaration), and that type is iteratively tightened. This approach has the advantage of always assigning correct types at any time, even if a fixpoint is not yet reached. However, when compilation times are not an issue and the analysis is executed until termination, initialising bindings with a static time of NIL (the bottom type) and widening the type assignments iteratively would result in the automatic derivation of more precise types. If completed, this project would implement such a precise flow-sensitive data-flow analysis, and let it be enabled on demand within the normal compilation pipeline.
3.5.1 Deliverables
- Code samples exhibiting disastrously weak type derivation;
- A correct and probably very slow (or even non-terminating) implementation of a precise type derivation pass;
- Integration of that pass in the compilation pipeline;
- (optional) Experiments with strategies to ensure the pass terminates after reasonable computation times.
3.5.2 Prerequisites
Proficiency in discrete mathematics, particularly set theory. The student is expected to become familiar with data flow analyses in the course of the project.
3.5.3 Student learning
The student will learn to apply abstract concepts in the design of a data flow analysis pass for an impure language. They will do so within the limits of a standards-defined language that is not explicitly designed to enable sophisticated static analyses. Finally, they will research and discover strategies to render practically usable a theoretically correct symbolic process.
4 Back end work
The following projects improve or add functionality in the back end of the compiler, the phase that exploits architecture-specific information to generate machine code.
4.1 Modernising a graph-colouring register allocator
Variables are mapped to registers or stack locations in SBCL with a straightforward graph-colouring allocator. Graph colouring is a well-known combinatorial optimisation problem, and, although it is NP-Hard, some heuristic methods are known to perform very well in practice. Exploiting such methods in the register allocator would enable code to avoid spills when register pressure is high; this is a particularly pressing issue on x86. Moreover, while SBCL's register allocator allocates registers (or stack slots) to variables, contemporary allocator obtain much better results by mapping registers to values, thus allowing the values associated with a single variable to be stored in different locations at distinct points in the code. This can be achieved to a lesser extent by more closely tracking values within a single basic block and by splitting live ranges in multiple sections, if compensation code is inserted as needed. Through any of these means, the project would improve the performance of code that exhibits high register pressure.
4.1.1 Deliverables
- Accumulate a few functions that exhibit register allocation issues;
- Improve the graph-colouring heuristic by adapting well-known methods;
- Allocate registers more finely within a single basic block;
- (optional) Implement a live-range splitting pass;
- (optional) Exploit register-register exchange instructions to enable SSA-style register allocation.
4.1.2 Prerequisites
Basic familiarity with assembly language. Some knowledge of discrete mathematics and of graph theory.
4.1.3 Student learning
The student will apply sophisticated heuristics to solve an NP-Hard problem. They will also learn to map abstract discrete optimisation concepts to concrete low-level concepts such as architectural registers and assembly instructions, and develop specialised methods that straddle the two views to exploit domain-specific knowledge.
4.1.4 Preliminary work, notes, etc.
Andrew Appel has some interesting stuff http://www.cs.princeton.edu/~appel/coalesce/.
The allocator is in src/compiler/pack.lisp, and is basically a greedy allocator (now that CMUCL's strange rotating heuristic has been disabled). However, rather than colouring with respect to any well-known ordering heuristic (e.g. the Welsh-Powell decreasing degree order, or Brélaz's dynamic min-degree order), TNs (variables) are ordered for allocation by considering only the loop nesting depth and a cost value.
4.2 Selecting concrete representations cleverly
An essential trick to SBCL's performance is its ability to represent the same data in multiple manners, as boxed values, or as machine-native values, e.g. signed or unsigned words, or floating point values in registers. However, not all operations have the same requirements: some, like arbitrary calls, are better served with boxed values, while others, like floating-point arithmetic, are more efficient on natively-represented values. SBCL currently assigns a single concrete representation for each variable. This selection algorithm can be revisited and improved, but the constraints under which it operates could be relaxed as well: nothing prevents a variable's values to be represented in different manners at distinct program points, as long as conversion code is inserted at key locations. In particular, this additional leeway imposes a very low overhead if it is only exploited within each basic block. This project requires creativity to understand an exotic combinatorial optimisation problem and determine how best to tackle it and adapt existing methods; if successful, it would likely result in the development of a novel algorithm for a problem essential to the performance of dynamically-typed languages, and its application in a pre-existing compiler, thus showing clear practical improvement.
4.2.1 Deliverables
- Describe the objective and constraints in the representation selection problem;
- Reduce the representation selection problem to better understood optimisation problems with efficient heuristics;
- Improve the currently-existing representation selection heuristic;
- (optional) Implement value-based representation selection within basic blocks;
- (optional) Implement SSA-style representation selection, with conversion code at control flow fork/join points.
4.2.2 Prerequisites
Basic familiarity with assembly language. Some knowledge of discrete mathematics and of graph theory.
4.2.3 Student learning
The student will learn to describe a complicated, industrial, combinatorial optimisation problem formally, and to reduce it to classic optimisation problems. They will then have an opportunity to exploit this theoretical basis to improve on an existing heuristic, in practice. Finally, they will be lead to iteratively improve that formal model to better represent the actual problem, while ensuring the existence of efficient solving methods.
4.3 Peephole optimisation
Most of the cleverness in SBCL is in the front-end. Once Common Lisp has been lowered to an explicitly-typed C-level intermediate representation, very few analyses and optimisations are performed, when compared to other compilers. As a result, clearly suboptimal code sequences are generated, particularly at the boundary between two operations. A peephole optimisation pass would detect such sequences and eliminate them or replace them with more efficient code. This could be achieved by considering sequences of instructions or of virtual operations (instruction templates), or even by reconstructing a tree from virtual operations. The project would implement such a peephole pass, offer a modular way to define new patterns, and add a few such patterns for commonly-used architectures.
4.3.1 Deliverables
- Determine how to inject ad hoc rewrites during the emission of machine code;
- Implement a simple, specialised, optimisation using that mechanism;
- Develop a pattern-definition language appropriate for the chosen rewriting mechanism;
- Implement some rewrites using that pattern-definition language, and show improvements in some degenerate cases.
4.3.2 Prerequisites
Familiarity with assembly language in at least one platform supported by SBCL.
4.3.3 Student learning
The student will have evaluated how to best extend a system in a direction that was not considered during its initial design. They will also review various approaches to improve code generation at a low level, and create a new domain-specific language that will be used by others. They will finally develop code to detect and improve code at the assembly level.
5 Diving in the runtime system
Projects in this section really let the coder feel bits and syscalls between their toes. Work involving the runtime system tends to be challenging because the programmer must take into account many facets of SBCL, from library code, to type information in the middle end, to invariants in the back end. These are high risk/high reward projects; most of the risk is due to a high potential for failure, but it's also not obvious that a successful implementation will result in any improvement.
5.1 Simpler structure layout
Structure objects in Common Lisp support single inheritance and typed slots so as to offer both extensibility and performance. SBCL implements slots of unboxed data by allocating them from the end of (variable-length, because of inheritance) structure objects. Access to an unboxed slot thus first determines the size of the current object in all but a few cases. However, nothing prevents boxed and unboxed slots from being interleaved, as long as alignment requirements are respected and the garbage collector's scanning routine for structures is suitably adapted. A branch partially implements this improvement, and the work is assuredly feasible. If completed, this project would lead to more efficient access to structure slots, and potentially let structures be laid out compatibly with some ABIs, thus simplifying foreign calls.
5.1.1 Deliverables
- Forward port the interleaved unboxed slot branch;
- Develop tests for the garbage collector, the interpreter, and the introspection facilities;
- Determine how to use this new freedom in object layouts;
- Measure the impact of the changes on the memory usage and on the runtime performance of representative programs.
5.1.2 Prerequisites
Some experience with Common Lisp. Familiarity with x86 or x86-64 assembly language is helpful but optional.
5.1.3 Student learning
The student would become comfortable with data representation issues at a low level, develop and execute tests for a fundamental part of the system, and guide technical decisions with empiric measures from benchmarks that reflect reality. Moreover, this would be done within the constraints specified by a standard.
5.1.4 Preliminary work, notes, etc.
https://github.com/pkhuong/sbcl/tree/fast-unboxed-struct seemed to be mostly working a couple years ago. There's still a lot of work to adapt introspection code, and then to actually exploit the new freedom in layout.
5.2 Software write barriers
Generational garbage collectors attempt to process the heap incrementally by determining when writes to old objects may have created new references. SBCL implements such write barriers with hardware memory protection. This means that writes can only be tracked at a coarse granularity (a page, at best), and that handling writes to previously clean pages is a complex affair involving several round trips between userspace and the operating system. Another approach is to modify the code generated by the compiler to explicit write barrier instruction sequences; each approach offers different performance characteristics, and it is not yet clear which is preferable for SBCL. Previous attempts at implementing card marking write barriers showed interesting results, but still exhibit subtle bugs. If successful, this project would forward port one such attempt, develop tests for the software write barriers, and implement an architectural change that involves large portions of the code base.
5.2.1 Deliverables
- Forward port of a software write barrier branch;
- Development of unit tests for the garbage collector, particularly for the write barriers;
- Characterisation of the strengths and weaknesses of software and memory protection write barriers;
- (optional) Determine how best to use software write barriers in SBCL, e.g. by allowing boxed and unboxed objects to be allocated contiguously, or by enabling huge pages.
5.2.2 Prerequisites
Familiarity with x86-64 assembly language.
5.2.3 Student learning
The student would acquire experience working with compilers and runtime systems at the assembly level, and develop techniques to automatically detect subtle code generation and concurrency bugs. They would also improve their ability to characterise the performance of (a part of) a complex system, and act upon that information to improve such a system.
5.2.4 Preliminary work, notes, etc.
The were several trees over the past couple years. The latest one is https://github.com/pkhuong/sbcl/tree/card-2012-3. I believe there's some strange failures for tiny card sizes, and I haven't yet ruled out an old bug elsewhere.
5.3 Improving SBCL's memory allocator
SBCL's memory management system includes a page-level block allocator. It is mainly used to manage large objects and thread-local allocation regions. The scheme exhibits at least two clear deficiencies. First, large objects are always allocated on page boundaries, exacerbating cache aliasing issues, in addition to, in effect, misaligning the contents of vectors. Second, normal (pointer-ful) and unboxed (pointer-free) blocks are interleaved, but regular pages tend to be write-protected and unboxed pages never are; these short ranges of addresses with different properties increase the address space management overhead for the operating system, which is reflected in long times spent in system code. There are likely other problems that can be improved with little effort; for example, the proportion of zeros in image files hints at a fragmentation issue.
Upon completion, this project would have lead to the identification of systems-level issues in SBCL's memory management strategies, and to the application of changes to eliminate or alleviate these issues.
5.3.1 Deliverables
- Microbenchmarks to understand the effect of aliasing and data misalignment when processing large (vector) data;
- Development of alternative large object allocation and deallocation routines that improve the cache utilisation of SBCL;
- Development of a block allocation strategy that is better suited to the algorithms and data structures operating systems use to manage ranges of virtual memory;
- (optional) Identification of further global issues in SBCL's memory management.
5.3.2 Prerequisites
Proficiency in C. Basic understanding of computer architecture and of contemporary operating systems.
5.3.3 Student learning
The student will gain concrete experience at debugging and tuning the performance of a complete system, with interactions between a managed language runtime, the operating system, and the CPU's microarchitecture. In addition to applying concepts from computer architecture and operating systems theory to improve the performance of a runtime system, they will also apply statistical techniques to convincingly exhibit these improvements.
5.3.4 Preliminary work, notes, etc.
https://github.com/pkhuong/sbcl/tree/discontiguous-unboxed-allocation seemed to mostly work to avoid interleaving pages.
5.4 Garbage collection debugging and heap assertions
Pointer-based data structures can be particularly difficult to debug; automatic garbage collection eliminates some of these bugs, but also introduces subtle memory leaks. Some ad hoc tools are already available to explore SBCL's heap semi-automatically and thus track down spurious roots. They are not user-friendly, and the information could be exposed more efficiently. Similarly, SBCL exposes tools to fully traverse the heap; however, the functions are brittle and primitive, and it is difficult to verify that some heap-allocated data structure satisfy its invariants, particularly if the structure is large. This project consists of improving upon existing tools to allow programmers to efficiently detect and eliminate memory leaks and check problem-specific data structure invariants.
5.4.1 Deliverables
- Development of a few fixed assertions (e.g. that an objects is dead, or is refered by exactly one heap object) that can be checked asynchronously, during or after major garbage collection cycles;
- A vistualisation tool to help track references in object graphs, and to better understand the size of data in SBCL's heap;
- Integration of the assertions in minor GC;
- (optional) Development of a domain specific language to allow the description of specialised assertions.
5.4.2 Prerequisites
Familiarity with manual memory management, and with the manipulation of pointers. Knowledge of graph theory and predicate logic may be useful. Interest in the low end of complexity classes would be an asset.
5.4.3 Student learning
The student would identify how to respond to the needs expressed by programmers, most likely by taking inspiration from tools described in scientific literature. They would also become proficient at processing large data sets (heaps) with streaming approaches that use little auxiliary storage. Finally, they would gain experience in the design and implementation of domain specific languages: the language will balance the conflicting goals of expressiveness and of efficient execution within very little writable storage.
5.5 A hybrid copying/mark-and-sweep garbage collector
SBCL's garbage collector is a mostly-copying generational garbage collector (gencgc). Heap sizes have grown by multiple orders of magnitude since the era when gencgc was designed. It now seems interesting to only use a copying garbage collector for newly-allocated data, and to reduce writes to older data by performing major collections with a mark-and-sweep pass. Depending on the student's affinity, this may prove to be less work or more interesting than integrating a third-party design. Completion of this project entails the development of a memory management subsystem, including difficult software development choices. The result would be improvements in the responsiveness and throughput of a production language implementation.
5.5.1 Deliverables
- Description of the interface between SBCL's runtime and its memory management subsystem;
- Implementation of mark-and-sweep in a distinct heap from the current copying one, with eviction from the copying heap to the mark-and-sweep heap;
- Integration of the mark-and-sweep heap with the copying heap and its data structures;
- (optional) Adaptation of mark-and-sweep GC to efficiently load images and to save less fragmented images;
- (optional) Improved performance on read-mostly data.
5.5.2 Prerequisites
Knowledge of C and understanding of operating systems internals.
5.5.3 Student learning
The student will appreciate the challenges faced by the implementations of languages with managed memory. They will understand the theory of classic garbage collection and memory management algorithms, and implement them. They will also develop code that interacts directly with the operating system in order to manage resources efficiently.
5.5.4 Preliminary work, notes, etc.
A minimal working (?) mark/sweep, plugged in the usual collector at https://github.com/pkhuong/sbcl/tree/foreign-mark-sweep. There's actually very little that must be changed in pre-existing code to get something going.
5.6 Replacing the garbage collector with MPS
SBCL's garbage collector is well-modularised, but shares code only with CMUCL, and suffers from a rigid design. Ravenbrook's Memory Pool System (MPS) is a flexible, high-performance, open source memory management subsystem, and SBCL would likely gain in performance and in robustnessby replacing its specialised garbage collector with MPS, augmented with SBCL-specific extensions. If successful, this project would deeply integrate a complex library in a large system by characterising their respective requirements and judiciously inserting logic to make the two designs compatible.
5.6.1 Deliverables
- Description of the interface between SBCL's runtime and its memory management subsystem;
- Stand-alone program that interfaces with MPS and presents challenges similar to that of SBCL (tagged pointers, several types of weaknesses, some data layout described by normal heap objects, ability to load a heap from disk, etc.);
- Port the stand-alone program's design to SBCL, for a single platform;
- (optional) Determine which pools are better suited to Common Lisp, and how to expose that choice to users.
- (optional) Port SBCL/MPS to several platforms.
5.6.2 Prerequisites
Knowledge of C and understanding of operating systems internals.
5.6.3 Student learning
The student will become familiar with the memory management subsystem used in a managed language that supports OS threads, and with the internals of a state of the art garbage collector. They will then gain experience with the integration of independently-developed systems. They will also interface directly with the operating systems as required for the development of a performance-oriented runtime system.
6 Projects that may be further described if there is interest
The path for these project is significantly murkier, as is their usefulness.
6.1 Free-er form displaced arrays
6.2 Autodxification of higher-order functions
6.3 Better failure modes for heap overflow
Flexible heap, and GC before allocating.
6.4 Inline caches for PCL
6.5 clang(++)-based FFI
6.6 MPFR for long floats
6.7 Structured codewalking
6.8 Hygienic macros
6.9 Precompilation support for CLOS
6.10 CLOS sealing
6.11 GCed special variables
6.12 Precise stack scanning
The garbage collector used by SBCL on x86 and x86-64 is conservative: any aligned word on the control stack or in a register that might be a tagged pointer is treated as a potential root. Consequently, some definite roots are treated as potential ones, preventing some heap compaction, and spurious potential roots cause spurious memory leaks. Being able to partition some stack values as definite references, non-references, and potential references would greatly alleviate this issue.
6.13 A binary serialisation library
Many applications depend on the efficient (both in terms of time and of space) serialisation of data. A few Common Lisp libraries address this concern portably, and "fasl" files offer very powerful serialisation to files. Exploiting implementation-specific knowledge and placing restriction on serialised data enables an efficient wire protocol. Such a wire protocol would enable applications to efficiently exchange or store data (and even code) between programs, SBCL versions, or, potentially, platforms.
6.14 Contracts
6.15 Core relocation
6.16 Reducing the size of delivered applications
Applications built with SBCL tend to be distributed as source or as monolithic "images", snapshots of the managed heap. Though they may be compressed, image files are rather large: they include a complete development system, including the compiler. The size of applications could be reduced by either distributing changes from a base image, or by allowing users to enable more aggressive garbage collection when saving images. In order to complete this project, a student would identify a promising approach, implement it, and exhibit practical benefits in terms of image size.
6.17 Unboxed calling convention
6.18 Allocation pools
6.19 Shared memory multi-process heap
6.20 Faster FASL loading
6.21 Scheduling pass
6.22 utf-8b
6.23 Parametric recursive types
Date: 2013-04-13 20:16:19 CEST