For the initial version of the language we just need Bool
, Number
and String
to have defined layout. It is probably worth thinking
about the future structure of arrays, tuples, and structs though.
The plan is that Bool
, Number
, and String
will all have value
semantics. That is a modification of a String
value in one place
will not affect its appearence in another. Such behaviour is referred
to as "value semantics". This is similar to Copy
types in rust and
struct
types in C#. I feel that tuple types should also have value
semantics. Similar to ValueTuple
in C#.
Array and structure types instead will have reference semantics. This
means that passing a structure to a fn
will allow the function to
modify the structure value. This is similar to reference types in C#
and &mut
reference in Rust.
For the primitive types we have the following type layouts from language type to LLVM type:
Bool
-> i1
Number
-> i64
String types are represented as a pair of length, data:
String
-> <{u32,[0 x u8]}>*
The value of the string is encoded directly as part of the pair. Allocation of a string uses a variable length array to contain a sequence of utf-8 characters. There are a few problems with this:
Given these concerns we could lay a string out as:
String
-> <{u32, u32, [0 x u8]}>*
In this representation each string has a pointer to a reference counted backing buffer. This should reduce copy-size of each string and means that a string reference would again have a single easily known size. We still need to know when the reference should be deallocated however.
Rather than aiming to control access to data as Rust does the language should provide a garbage collection mechanism to clean up data once no one references it. There are a few alternatives for this:
For a full mark and sweep or other collector the code generated needs to insert GC statepoints. For this reason i'm tempted to head towards the second option. I place of statepoints we will need to decide in the lower pass where to insert RC retain and release code to maintain the count. Could this work the same way as Rust's ARC model? In that case the code just needs to know a fixed point in the scope where each value is deallocated.