Hi,

mm-ADT is moving along nicely. I recently overcame a pretty nasty hurdle — references.

mm-ADT supports a reference type. In mm-ADT-bc (the mm-ADT bytecode language), a reference is specified as follow.

[db][define,person,[name:@string,age:(@int|@string)]]
    [insert,people,
      [person[name:marko,   age:29]
       person[name:kuppitz, age:littlegirl]
       ...
       person[...]]
    [values,people,~person*![[db][values,people]]]
      
The last instruction is your basic “get(key)”. However, there is a small addition. All data access instructions take an optional @reference. The reference I passed in was:

~person*![[db][values,people]]

This thing says:

“I’m a reference ~ to a zero or more * person objects. In order to dereference me, please use attached [[db][values,people]] bytecode."

This is what I (the compiler) know about this “people-table” filled with person objects. However, because [values] is a storage system access, the storage system can take the reference and sup it up. For instance, the output of [values] may contain:

~person[age:!lt(85){456}![[db][values,people]]
  -> [has,name,eq,$x.@string]     => ~person[name:x]?
  -> [dedup,name]                 => ![[noop]]
  -> [order,name,asc,(age,desc)?] => ![[noop]]

Cool. The storage system really sup’d up my reference. What new information did I get?

1. The storage system knows that no one is older than 85.
2. The storage system knows there are 456 person records in the "people-table.”
3. The storage system is saying that it has an index on name.
4. The storage system is saying that names are unique.
5. The storage system is saying the the person records are sorted by name (w/ ties broken by age).

Lets go back to our original bytecode.

...
[values,people,~person*![[db][values,people]]]
[dedup,name]
[has,name,eq,marko]

If [dedup] and [has] are the next instructions, guess what, [dedup] matches the [dedup,name] instruction on the sup’d up ~person so the virtual machine does nothing (no-op). Then the next [has] instruction does require me to dereference the ~person with the [[db][values,people]] dereference bytecode. Nope, instead my ~person maps to a new ~person?. This is an index lookup and the question make says there is 0 or 1 referent for this reference. Again, the storage system knows that the “people-table” person objects have unique names (schema inference lets say).

————

Anywho, the big problem I solve was “how do you mechanically dereference a reference?”

The answer: bytecode.

Check this simple example out.

[db][define,project,[title:@string]]
    [define,person,[name:@string,project:@project]]
    [define,tp,project[title:tinkerpop]]
    [insert,projects,tp]
    [insert,people,
      [person[name:marko,   project:tp]
       person[name:kuppitz, project:tp]]

Okay, so our [db] is looking something like this:


Looks good so far. Now what happens when we do the following:

[db][values,people]
    [has,name,eq,marko]
    [value,project]
    [insert,lang,java]


Doh! Pass by value. So how do we solve this generally? Well, with references(pointers) of course. The problem I was facing was how does a vendor manage mm-ADT references. That is asking a lot of them. Then it came to me, the reference should manage its own path to dereferencing. Bytecode!

[db][define,project,[title:@string]]
    [define,person,[name:@string,project:@project]]
    [define,tp,     project[title:tinkerpop]]
    [define,tp-ref, ~project[title:tinkerpop]![[db][values,projects][has,title,eq,tinkerpop]]]
    [insert,projects,tp,tp-ref]
    [insert,people,
      [person[name:marko,   project:tp-ref]
       person[name:kuppitz, project:tp-ref]]


Sweet so far. How about when we mutate?

[db][values,people]
    [has,name,eq,marko]
    [value,project]
    [insert,lang,java]


Tada! So two things.

1. The tp-ref contains enough information for it to always be able to dereference to the same logical object.
~project[title:tinkerpop]![[db][values,projects][has,title,eq,tinkerpop]]]
2. However, if the vendor says: “don’t use that inefficient, index-based data access path! I support object pointers (ala graph databases),” the vendor can change the dereference bytecode to leverage their internal pointers.
- Thus, it works regardless. RDBMS will be index lookups and joins. Graph databases will be direct pointers.

Finally, you might wonder the mechanics of whats happening. Here is each instruction in the mutation block and what object is outgoing from it.

1. [has,name,eq,marko] ==>
         person[name:marko,project:~project[title:tinkerpop]![[db][values,projects][has,title,eq,tinkerpop]]]]
2. [value,project] ==>
         ~project[title:tinkerpop]![[db][values,projects][has,title,eq,tinkerpop]]]
3. [insert,lang,java] ==>
          // NO WAY TO SOLVE THIS INSTRUCTION WITH THE REFERENCE DATA. WE NEED TO DEREFERENCE.
4. [map,![[db][values,projects][has,title,eq,tinkerpop]]] ==>
         project[title:tinkerpop]
5. [insert,lang,java] ==>
         project[title:tinkerpop,lang:java]

Pretty neat, eh? And in terms of persistence, mm-ADT bytecode will have both a text and binary representation and thus, mm-ADT references will be able to be stored within any database (assuming they don’t support their own object references).

References are solved. The solution:

 the reference carries around the chunk of bytecode that contains the data access path to its referent.

Oh, and two quickies….

1. References can be shipped over the wire without having to be re-synchornized (the bytecode’s address space is the logical model, not the physical model).
2. References can reference objects not on the same machine as them (again, bytecode address space is not machine space).
*** IF THE GRAPHICS DIDN’T COME THROUGH, HERE IS A LINK TO THEM:
https://gist.github.com/okram/159a3652672cb15a4ea7184e1258ba6d

Marko.