tinkerpop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Mallette <spmalle...@gmail.com>
Subject Re: The Machine Interface of TP4.
Date Tue, 02 Apr 2019 15:14:49 GMT
I'm still not sure I follow how caching will work effectively. Like, I
follow that you can have bytecode local and remote and if the same bytecode
is seen in a cache the UUID can be sent in its stead but at least in TP3
semantics the bytecode for:

g.V().has('person','name','marko')

is different from:

g.V().has('person','name','stephen')

they won't equal and so while they are basically the same from an
instruction perspective, they aren't the same given their arguments. For
scripts in TP3 we had that caching because ScriptEngines have parameters:

g.V().has('person','name',x) -> {x: "marko"}
g.V().has('person','name',x) -> {x: "stephen"}

and therefore the compiled script for both were equal and caching was easy.
Is there a way to simulate that type of parameterization for bytecode so
that local/remote caching works nicely and we can see some significant
performance gains?

sorry if you already answered this somewhere as i have this sense we had
this conversation somewhere, but maybe i'm making that up.

On Wed, Mar 27, 2019 at 9:09 AM Marko Rodriguez <okrammarko@gmail.com>
wrote:

> Hi,
>
> > LocalMachine, it will lookup the registered UUID and if it exists, use
> the
> > pre-compiled source code.
>
> So what Machine.register() does generally, is up to the implementation.
>
> LocalMachine.register() does what TP3 does in TraversalSource. It
> “pre-compiles”.
>
>         - sort strategies
>                 TP3:
> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138
> <
> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L138
> >
>
> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47
> <
> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/util/DefaultTraversalStrategies.java#L47
> >
>         - sets up processor
>                 TP3:
> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141
> <
> https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalSource.java#L141
> >
>         - sets up structure
>                 TP3: this came for free because we created the traversal
> source from Graph.traversal().
>
> This way when you keep spawning traversals off of the same “g” we don’t
> have to re-compile the source instructions.
>
> > maybe i didn't follow properly but is this for the purpose of caching
> > traversals to avoid the costs of traversal to bytecode compilation?
>
>
> Note this is a SourceCompilation (just the source instructions are
> compiled), not the full instructions which is a Compilation.
>
> >  in other words, is this describing a general way to cache compiled
> bytecode so
> > that it doesn't have to go through strategy application more than once?
>
>
> To the concept of caching traversals, that is easy to do with the Machine
> interface. On Machine.submit(), a Map<Bytecode,Compilation> can exist. Same
> as TP3. However, check this, we can do it another way. Why even send the
> full Bytecode? If the RemoteMachine (which is local to the client) knows it
> already sent the same Bytecode before, it can send a single instruction
> Bytecode with an encoded UUID-like instruction. Thus,
> Map<UUID,Compilation>. Less data to transfer.
>
> RemoteMachine (client side) can keep a Map<Bytecode,UUID> and do the
> proper UUID-encoding.
> MachineServer (server side) can then Map<Bytecode,Compilation>, where if
> the received Bytecode is a single UUID-like instruction, fast lookup. If
> not, can still look it up!
>
> Thus, it is easy for us to do both types of caching with the Machine
> interface:
>
>         SourceCompilation: source bytecode caching.
>         Compilation: full bytecode caching.
>
> Keep the questions coming.
>
> Marko.
>
> http://markorodriguez.com <http://markorodriguez.com/>
>
>
> >
> >
> >
> > On Mon, Mar 25, 2019 at 8:48 AM Marko Rodriguez <okrammarko@gmail.com
> <mailto:okrammarko@gmail.com>>
> > wrote:
> >
> >> Hi,
> >>
> >> Here is how the TP4 bytecode submission infrastructure is looking.
> >>
> >> In TP3, TraversalSource maintained the “pre-compilation” of strategies,
> >> database connectivity, etc. This was not smart for the following
> reasons:
> >>
> >>        1. It assumed the traversal would execute on the same machine
> that
> >> it was created on.
> >>        2. We had to make an explicit distinction between local and
> remote
> >> execution via RemoteStrategy.
> >>        3. RemoteStrategy passes an excessive amount of data over the
> wire
> >> on each traversal submission (the source instructions!).
> >>        4. RemoteStrategy is bug prone with traversal inspection and
> >> RemoteStep, etc.
> >>
> >> In TP4, we are now going to assume that Bytecode (a traversal) is always
> >> submitted somewhere and this “somewhere" could be local or remote. This
> >> “somewhere” must implement the Machine interface.
> >>
> >>
> >>
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/Machine.java
> >>
> >> Machine makes explicit the TP4 communication protocol. The only objects
> >> being transmitted are either Bytecode or Traversers. Simple.
> >>
> >> Here is an example using LocalMachine:
> >>
> >> Machine machine = LocalMachine.open();
> >> TraversalSource g =
> >>
> Gremlin.traversal(machine).withProcessor(…).withStructure(…).withStrategy(…)
> >>
> >> The first time a traversal is generated from g, the Bytecode source
> >> instructions are registered with the machine.
> >>
> >>
> >>
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L99-L104
> >> <
> >>
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L99-L104
> <
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/language/gremlin/src/main/java/org/apache/tinkerpop/language/gremlin/TraversalSource.java#L99-L104
> >
> >>>
> >>
> >> The intention is that, on registration, the Machine will pre-compile the
> >> source instructions (sort strategies, ensure processor and structure
> >> setup/connectivity). Machine.register() returns a new Bytecode which
> >> contains the registration information for future lookup. This
> registration
> >> information is Machine-specific and can even be nothing!
> >>
> >>
> >>
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39
> <
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39
> >
> >> <
> >>
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39
> <
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/BasicMachine.java#L37-L39
> >
> >>>
> >>
> >> However, more intelligently, LocalMachine maintains a
> >> Map<UUID,SourceCompilation> which maintains pre-compiled source
> >> instructions.
> >>
> >>
> >>
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62
> <
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62
> >
> >> <
> >>
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62
> <
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/LocalMachine.java#L47-L62
> >
> >>>
> >>
> >> Now when bytecode (containing instructions for execution) is submitted
> to
> >> LocalMachine, it will lookup the registered UUID and if it exists, use
> the
> >> pre-compiled source code. As you can see, a pre-compilation has
> everything
> >> staged and ready for use.
> >>
> >>
> >>
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java
> <
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java
> >
> >> <
> >>
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java
> <
> https://github.com/apache/tinkerpop/blob/596caf3ab82f3b15c2c343af87be6d03f26d6d6e/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/SourceCompilation.java
> >
> >>>
> >>
> >> For remote execution, we simply need RemoteMachine which would serialize
> >> and deserialize Bytecode and Traversers to some RemoteMachineServer (or
> >> some provider-specific server able to use the basic protocol we will
> >> develop). For instance:
> >>
> >> Machine machine =
> RemoteMachine.open(Map.of(“ip”,”127.0.0.1”,”port”,”32”))
> >> TraversalSource g =
> >>
> Gremlin.traversal(machine).withProcessor(…).withStructure(…).withStrategy(…)
> >>
> >> // prior to V(), the bytecode is registered and a new “registration”
> >> bytecode is returned and appended with V and count instructions.
> >> g.V().count()
> >>
> >> // no registration occurs as the TraversalSource hasn’t changed, the
> >> bytecode is simply submitted.
> >> g.V().out().count()
> >>
> >> // the remote registration is removed
> >> g.close()
> >>
> >> // a new registration occurs
> >> g = g.withStrategy(…)
> >> g.V().drop()
> >>
> >> // the remote registration is removed
> >> g.close()
> >>
> >> Tada!
> >>
> >> WDYT?,
> >> Marko.
> >>
> >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message