tinkerpop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marko Rodriguez <okramma...@gmail.com>
Subject Re: A Question Regarding TP4 Processor Classifications
Date Thu, 04 Apr 2019 17:07:18 GMT

Thank you for the response.

> Excellent progress on the the RxJava processor. I was wondering if
> categories 1 and 2 can be combined where Pipes becomes the Flowable version
> of the RxJava processor?

I don’t quite understand your questions. Are you saying:


or are you saying:

	“Get rid of Pipes all together and just use single-threaded RxJava instead."

For the first, I don’t see the benefit of that. For the second, Pipes4 is really fast! —
much faster than Pipes3. (more on this next)

> In this case, though single threaded, we'd still
> get the benefit of asynchronous execution of traversal steps versus
> blocking execution on thread pools like the current TP3 model.

Again, I’m confused. Apologies. I believe that perhaps you think that the Step-model of
Pipes is what Bytecode gets compiled to in the TP4 VM. If so, note that this is not the case.
The concept of Steps (chained iterators) is completely within the pipes/ package. The machine-core/
package compiles Bytecode to a nested List of stateless, unconnected functions (called a Compilation).
It is this intermediate representation that ultimately is used by Pipes, RxJava, and Beam
to create their respective execution plan (where Pipes does the whole chained iterator step

Compilation: https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43

	Pipes: https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47
	Beam: https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132
	RxJava: https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103

> I would
> imagine Pipes would beat the Flowable performance on a single traversal
> side-by-side basis (thought perhaps not by much), but the Flowable version
> would likely scale up to higher throughput and better CPU utilization when
> under concurrent load.

Pipes is definitely faster than RxJava (single-threaded). While I only learned RxJava 36 hours
ago, I don’t believe it will ever beat Pipes because Pipes4 is brain dead simple — much
simpler than in TP3 where a bunch of extra data structures were needed to account for GraphComputer
semantics (e.g. ExpandableIterator).

I believe, given the CPU utilization/etc. points you make, that RxJava will come into its
own in multi-threaded mode (called ParallelFlowable) when trying to get real-time performance
from a query that touches/generates lots of data (traversers). This is the reason for Category
2 — real-time, multi-threaded, single machine. I only gave a quick pass last night at making
ParallelFlowable work, but gave up when various test cases were failing (— I now believe
I know the reason why). I hope to have ParallelFlowable working by mid-week next week and
then we can benchmark its performance.

I hope I answered your questions or at least explained my confusion.


http://rredux.com <http://rredux.com/>

> On Apr 4, 2019, at 10:33 AM, Ted Wilmes <twilmes@gmail.com> wrote:
> Hello,
> --Ted
> On Tue, Apr 2, 2019 at 7:31 AM Marko Rodriguez <okrammarko@gmail.com <mailto:okrammarko@gmail.com>>
>> Hello,
>> TP4 will not make a distinction between STANDARD (OLTP) and COMPUTER
>> (OLAP) execution models. In TP4, if a processing engine can convert a
>> bytecode Compilation into a working execution plan then that is all that
>> matters. TinkerPop does not need to concern itself with whether that
>> execution plan is “OLTP" or “OLAP" or with the semantics of its execution
>> (function oriented, iterator oriented, RDD-based, etc.). With that, here
>> are 4 categories of processors that I believe define the full spectrum of
>> what we will be dealing with:
>>        1. Real-time single-threaded single-machine.
>>                * This is STANDARD (OLTP) in TP3.
>>                * This is the Pipes processor in TP4.
>>        2. Real-time multi-threaded single-machine.
>>                * This does not exist in TP3.
>>                * We should provide an RxJava processor in TP4.
>>        3. Near-time distributed multi-machine.
>>                * This does not exist in TP3.
>>                * We should provide an Akka processor in TP4.
>>        4. Batch-time distributed multi-machine.
>>                * This is COMPUTER (OLAP) in TP3 (Spark or Giraph).
>>                * We should provide a Spark processor in TP4.
>> I’m not familiar with the specifics of the Flink, Apex, DataFlow, Samza,
>> etc. stream-based processors. However, I believe they can be made to work
>> in near-time or batch-time depending on the amount of data pulled from the
>> database. However, once we understand these technologies better, I believe
>> we should be able to fit them into the categories above.
>> In conclusion: Do these categories make sense to people? Terminology-wise
>> -- Near-time? Batch-time? Are these distinctions valid?
>> Thank you,
>> Marko.
>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <http://rredux.com/>>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message