Representing JS Values in C

When implementing JS a primary decision is how to represent JS values - the objects, numbers and strings that make up JS programs. Since I'm targeting C, there's a lot of flexibility in how to do this.

There are 8 primitive types in JS that we'll need to store:

The decision on representation will balance efficiency with ease of implementation. Highly-optimised implementations, like V8, will have multiple representations for a single type, for instance dense and sparse arrays. The immutable types can also be implemented differently to the mutable types: for instance, assignment of number values can be implemented by copying rather than referencing, as the number value can never change.

Note: this article will not explain the C language.

Functions, strings, objects and arrays will get a post to themselves; this time let's discuss the root JsValue type, numbers, and the boolean and null/undefined types.

JsValue struct

All JS values in js-to-c are represented by the JsValue struct, which looks like:

union JsValueValue {
   double number;
   void* pointer;
};

typedef struct JsValue {
    int type; 
    union JsValueValue value;
} JsValue;

type and value are the key: by looking up the type field we know the type of the value in the value union.

The value union differentiates between numbers where the value is embedded in the struct itself, and other values where the value is referenced via a pointer - strings, functions etc. undefined, null, NaN and true and false don't need a value stored; they're all immutable and implemented as singletons. A pointer equality will determine if a boolean value is true or false. Embedding number values makes them more efficient - x * y becomes x->value * y->value rather than *x->value->pointer * *x->value->pointer, saving two pointer dereferences.

The struct is kept private as otherwise refactoring it would likely touch the whole system. To create and manipulate values the language.h exposes a number of factory and getter/setter functions. Let's have a look at creating a number:

JsValue *jsValueCreateNumber(double number) {
    JsValue* val = gcAllocate(sizeof(JsValue), NUMBER_TYPE);
    val->value.number = number;
    return val;
}

You can see we allocate a value (gcAllocate allocates to the managed heap that is used to implemented garbage-collection) and then store the passed double in our value union. Since JS has IEEE-754 floating-point arithmetic, double gives us the 64 bit floating-point semantics we need. Code that needs to get at the value will call the jsValueNumber(JsValue) getter:

double jsValueNumber(JsValue* val) {
    return val->value.number;
}

For complex types and strings the pointer union member will point at another struct that contains the value, accessed via the jsValuePointer(*JsValue) *void method.

Null, undefined, booleans

I've implemented each of these by defining constants - that way the cost of a null, undefined, or bool value is a single pointer. I've put them behind inlineable getTrue() methods. I might eventually get around to checking if the efficiency is sufficient: next time!

var num1 = 42.7
var num2 = 17 + num1
var num3 = true * num2
var val = (num3 > 5) === true

Ends up as:

envDeclare(env, interned_1);
JsValue* init_2 = (internedNumber_3 /* 42.7 */);
envSet(env, interned_1, init_2);
envDeclare(env, interned_4);
JsValue* left_6 = (internedNumber_8 /* 17 */);
JsValue* right_7 = (envGet(env, interned_1 /* num1 */));
JsValue* init_5 = (addOperator(left_6, right_7));
envSet(env, interned_4, init_5);
envDeclare(env, interned_9);
JsValue* left_11 = (getTrue());
JsValue* right_12 = (envGet(env, interned_4 /* num2 */));
JsValue* init_10 = (multiplyOperator(left_11, right_12));
envSet(env, interned_9, init_10);
envDeclare(env, interned_13);
JsValue* left_17 = (envGet(env, interned_9 /* num3 */));
JsValue* right_18 = (internedNumber_19 /* 5 */);
JsValue* left_15 = (GTOperator(left_17, right_18));
JsValue* right_16 = (getTrue());
JsValue* init_14 = (strictEqualOperator(left_15, right_16));
envSet(env, interned_13, init_14);;

You can see the numbers are 'interned' - which is literal numbers in the source-code are pre-allocated at program startup, and are not part of garbage-checking.

Next time I'll dig into the more complex types: objects, arrays, functions and strings.

Enjoy this? Subscribe to my RSS feed or follow me on Twitter.