tinkerpop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marko Rodriguez <okramma...@gmail.com>
Subject Re: The Machine Interface of TP4.
Date Tue, 02 Apr 2019 17:17:34 GMT
Hi,

> I'm still not sure I follow how caching will work effectively. Like, I
> follow that you can have bytecode local and remote and if the same bytecode
> is seen in a cache the UUID can be sent in its stead but at least in TP3
> semantics the bytecode for:

There are two-levels to bytecode:

	source instructions (withXXX)
	instructions (out, in, count)

LocalMachine is just caching a compilation of the source instructions. This is necessary because
TraversalSource no longer has any state so if you want to have the strategies pre-sorted and
database connections open, you have to do it via a Machine implementation — via SourceCompilation.
This is what the Machine.register() method does. 

Explained another way — TraversalSource is a Gremlin language-specific class. The TP4 virtual
machine doesn’t know what a TraversalSource is. It only cares about Bytecode.

> and therefore the compiled script for both were equal and caching was easy.
> Is there a way to simulate that type of parameterization for bytecode so
> that local/remote caching works nicely and we can see some significant
> performance gains?

I have not thought through instruction bytecode parameterization and caching. This is different
than Machine.register(). This would occur when you Machine.submit() -- if the bytecode already
exists in a cache as a Compilation, then you just fetch it and you don’t have to re-apply
strategies and re-generate the intermediate function representation. As such, this type of
caching doesn’t effect the Machine interface definition and would just be a Machine-specific
implementation detail. Not that its not important as we need to get bytecode parameterization
and caching down, just that its a "different topic.” I forget how we did “bindings”
in TP3, but I remember you saying its a ThreadLocal model and its janky. What do you recommend
for bindings in TP4? Perhaps create a another email thread.

HTH,
Marko.

http://rredux.com



> sorry if you already answered this somewhere as i have this sense we had
> this conversation somewhere, but maybe i'm making that up.
> 
> On Wed, Mar 27, 2019 at 9:09 AM Marko Rodriguez <okrammarko@gmail.com <mailto:okrammarko@gmail.com>>
> wrote:
> 
>> Hi,
>> 
>>> LocalMachine, it will lookup the registered UUID and if it exists, use
>> the
>>> pre-compiled source code.
>> 
>> So what Machine.register() does generally, is up to the implementation.
>> 
>> LocalMachine.register() does what TP3 does in TraversalSource. It
>> “pre-compiles”.
>> 
>>        - sort strategies
>>                TP3:
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138>
>> <
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138>
>>> 
>> 
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47>
>> <
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47>
>>> 
>>        - sets up processor
>>                TP3:
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141>
>> <
>> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141
<https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141>
>>> 
>>        - sets up structure
>>                TP3: this came for free because we created the traversal
>> source from Graph.traversal().
>> 
>> This way when you keep spawning traversals off of the same “g” we don’t
>> have to re-compile the source instructions.
>> 
>>> maybe i didn't follow properly but is this for the purpose of caching
>>> traversals to avoid the costs of traversal to bytecode compilation?
>> 
>> 
>> Note this is a SourceCompilation (just the source instructions are
>> compiled), not the full instructions which is a Compilation.
>> 
>>> in other words, is this describing a general way to cache compiled
>> bytecode so
>>> that it doesn't have to go through strategy application more than once?
>> 
>> 
>> To the concept of caching traversals, that is easy to do with the Machine
>> interface. On Machine.submit(), a Map<Bytecode,Compilation> can exist. Same
>> as TP3. However, check this, we can do it another way. Why even send the
>> full Bytecode? If the RemoteMachine (which is local to the client) knows it
>> already sent the same Bytecode before, it can send a single instruction
>> Bytecode with an encoded UUID-like instruction. Thus,
>> Map<UUID,Compilation>. Less data to transfer.
>> 
>> RemoteMachine (client side) can keep a Map<Bytecode,UUID> and do the
>> proper UUID-encoding.
>> MachineServer (server side) can then Map<Bytecode,Compilation>, where if
>> the received Bytecode is a single UUID-like instruction, fast lookup. If
>> not, can still look it up!
>> 
>> Thus, it is easy for us to do both types of caching with the Machine
>> interface:
>> 
>>        SourceCompilation: source bytecode caching.
>>        Compilation: full bytecode caching.
>> 
>> Keep the questions coming.
>> 
>> Marko.
>> 
>> http://markorodriguez.com <http://markorodriguez.com/> <http://markorodriguez.com/
<http://markorodriguez.com/>>
>> 
>> 
>>> 
>>> 
>>> 
>>> On Mon, Mar 25, 2019 at 8:48 AM Marko Rodriguez <okrammarko@gmail.com <mailto:okrammarko@gmail.com>
>> <mailto:okrammarko@gmail.com <mailto:okrammarko@gmail.com>>>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Here is how the TP4 bytecode submission infrastructure is looking.
>>>> 
>>>> In TP3, TraversalSource maintained the “pre-compilation” of strategies,
>>>> database connectivity, etc. This was not smart for the following
>> reasons:
>>>> 
>>>>       1. It assumed the traversal would execute on the same machine
>> that
>>>> it was created on.
>>>>       2. We had to make an explicit distinction between local and
>> remote
>>>> execution via RemoteStrategy.
>>>>       3. RemoteStrategy passes an excessive amount of data over the
>> wire
>>>> on each traversal submission (the source instructions!).
>>>>       4. RemoteStrategy is bug prone with traversal inspection and
>>>> RemoteStep, etc.
>>>> 
>>>> In TP4, we are now going to assume that Bytecode (a traversal) is always
>>>> submitted somewhere and this “somewhere" could be local or remote. This
>>>> “somewhere” must implement the Machine interface.
>>>> 
>>>> 
>>>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java>
>>>> 
>>>> Machine makes explicit the TP4 communication protocol. The only objects
>>>> being transmitted are either Bytecode or Traversers. Simple.
>>>> 
>>>> Here is an example using LocalMachine:
>>>> 
>>>> Machine machine = LocalMachine.open();
>>>> TraversalSource g =
>>>> 
>> Gremlin.traversal(machine).withProcessor(…).withStructure(…).withStrategy(…)
>>>> 
>>>> The first time a traversal is generated from g, the Bytecode source
>>>> instructions are registered with the machine.
>>>> 
>>>> 
>>>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L99-L104
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L99-L104>
>>>> <
>>>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L99-L104
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L99-L104>
>> <
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L99-L104
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L99-L104>
>>> 
>>>>> 
>>>> 
>>>> The intention is that, on registration, the Machine will pre-compile the
>>>> source instructions (sort strategies, ensure processor and structure
>>>> setup/connectivity). Machine.register() returns a new Bytecode which
>>>> contains the registration information for future lookup. This
>> registration
>>>> information is Machine-specific and can even be nothing!
>>>> 
>>>> 
>>>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39>
>> <
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39>
>>> 
>>>> <
>>>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39>
>> <
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39>
>>> 
>>>>> 
>>>> 
>>>> However, more intelligently, LocalMachine maintains a
>>>> Map<UUID,SourceCompilation> which maintains pre-compiled source
>>>> instructions.
>>>> 
>>>> 
>>>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62>
>> <
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62>
>>> 
>>>> <
>>>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62>
>> <
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62>
>>> 
>>>>> 
>>>> 
>>>> Now when bytecode (containing instructions for execution) is submitted
>> to
>>>> LocalMachine, it will lookup the registered UUID and if it exists, use
>> the
>>>> pre-compiled source code. As you can see, a pre-compilation has
>> everything
>>>> staged and ready for use.
>>>> 
>>>> 
>>>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java>
>> <
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java>
>>> 
>>>> <
>>>> 
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java>
>> <
>> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java
<https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java>
>>> 
>>>>> 
>>>> 
>>>> For remote execution, we simply need RemoteMachine which would serialize
>>>> and deserialize Bytecode and Traversers to some RemoteMachineServer (or
>>>> some provider-specific server able to use the basic protocol we will
>>>> develop). For instance:
>>>> 
>>>> Machine machine =
>> RemoteMachine.open(Map.of(“ip”,”127.0.0.1”,”port”,”32”))
>>>> TraversalSource g =
>>>> 
>> Gremlin.traversal(machine).withProcessor(…).withStructure(…).withStrategy(…)
>>>> 
>>>> // prior to V(), the bytecode is registered and a new “registration”
>>>> bytecode is returned and appended with V and count instructions.
>>>> g.V().count()
>>>> 
>>>> // no registration occurs as the TraversalSource hasn’t changed, the
>>>> bytecode is simply submitted.
>>>> g.V().out().count()
>>>> 
>>>> // the remote registration is removed
>>>> g.close()
>>>> 
>>>> // a new registration occurs
>>>> g = g.withStrategy(…)
>>>> g.V().drop()
>>>> 
>>>> // the remote registration is removed
>>>> g.close()
>>>> 
>>>> Tada!
>>>> 
>>>> WDYT?,
>>>> Marko.
>>>> 
>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>
<http://rredux.com/ <http://rredux.com/> <
>> http://rredux.com/ <http://rredux.com/>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message