Next: , Up: Character and String Types


9.1 Memory Layout

Characters are immediate objects (that is, they require no heap allocation) in all permutations of build-time options. Even on a 32-bit platform with :SB-UNICODE, there are three bits to spare after allocating 8 bits for the character widetag and 21 for the character code. There is only one such layout, and consequently only one widetag is needed: the difference between base-char and character is purely on the magnitude of the char-code.

Objects of type (simple-array nil (*)) are represented in memory as two words: the first is the object header, with the appropriate widetag, and the second is the length field. No memory is needed for elements of these objects, as they can have none.

Objects of type simple-base-string have the header word with widetag, then a word for the length, and after that a sequence of 8-bit char-code bytes. The system arranges for there to be a null byte after the sequence of lisp character codes.

Objects of type (simple-array character (*)), where this is a distinct type from simple-base-string, have the header word with widetag, length, and then a sequence of 32-bit char-code bytes. Again, the system arranges for there to be a null word after the sequence of character codes.

Non-simple character arrays, and simple character arrays of non-unit dimensionality, have an array header with a reference to an underlying data array of the appropriate form from the above representations.