The :SB-UNICODE
feature implies support for all 1114112 potential
characters in the character space defined by the Unicode consortium,
with the identity mapping between lisp char-code
and Unicode code
point. SBCL releases before version 0.8.17, and those without the
:SB-UNICODE
feature, support only 256 characters, with the
identity mapping between char-code
and Latin1 (or, equivalently,
the first 256 Unicode) code point.
In the absence of the :SB-UNICODE
feature, the types
base-char
and character
are identical, and encompass the
set of all 256 characters supported by the implementation. With the
:SB-UNICODE
on *features*
(the default), however,
base-char
and character
are distinct: character
encompasses the set of all 1114112 characters, while base-char
represents the set of the first 128 characters.
The effect of this on string types is that an sbcl configured with
:SB-UNICODE
has three disjoint string
types: (vector
nil)
, base-string
and (vector character)
. In a build
without :SB-UNICODE
, there are two such disjoint types:
(vector nil)
and (vector character)
; base-string
is
identially equal to (vector character)
.
The SB-KERNEL:CHARACTER-SET-TYPE
represents possibly
noncontiguous sets of characters as lists of range pairs: for example,
the type standard-char
is represented as the type
(sb-kernel:character-set '((10 . 10) (32 . 126)))