tinkerpop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marko Rodriguez <okramma...@gmail.com>
Subject Re: A Question Regarding TP4 Processor Classifications
Date Thu, 04 Apr 2019 21:25:13 GMT
Hi,

> The Pipes implementation will no doubt be
> faster for execution of a single traversal but I'm wondering if the
> Flowable RxJava would beat the Pipes processor by providing higher
> throughput in a scenario when many, many users are executing queries
> concurrently.

Why would you think that? When a query comes in, you can just new Thread(pipes).run() and
thread it. ? 

*** My knowledge of server architectures is pretty weak, but….

I think the model for dealing with many concurrent users will be up to the server implementation.
If you are using LocalMachine, then its one query at a time. But if you are using RemoteMachine
to, lets say, the TP4 MachineServer, traversals are executed in parallel. And, for most providers
who will have their own server implementation (that can be communicated with by RemoteMachine),
they will handle it as they see fit (e.g. doing complex stuff like routing queries to different
machines in the cluster to load balance or whatever).

One thing I’m trying to stress in TP4 is “no more complex server infrastructure.” You
can see our MachineServer implementation. Its ~100 lines of code and does parallel execution
of queries. Its pretty brain dead simple, but with some modern thread/server techniques you
all might have, we can make it a solid little server that meets most providers’ needs —
else, they just roll those requirements into their server system.

	https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/remote/MachineServer.java
<https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/species/remote/MachineServer.java>

> Regardless, providers having multiple processor options is a
> good thing and I don't mean to suggest any premature optimization. At this
> point, I think I'll put together some simple benchmarks just out of
> curiosity but will report back.

Yea, I’m trying to cover all the “semantic bases:”

	Actor model: Akka
	Map/Reduce model: Spark
	Push-based model: RxJava
	Pull-based model: Pipes

If the Compilation/Processor/Bytecode/Traverser/etc. classes are sufficiently abstract to
naturally enable all these different execution models, then we are happy. So far so good…

Marko.

http://rredux.com

> 
> --Ted
> 
> On Thu, Apr 4, 2019 at 12:36 PM Marko Rodriguez <okrammarko@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> This is a pretty neat explanation of why Pipes will be faster than RxJava
>> single-threaded.
>> 
>> The map-operator for Pipes:
>> 
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/MapStep.java
>>> 
>> 
>> The map-operator for RxJava:
>> 
>> https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java
>> <
>> https://github.com/ReactiveX/RxJava/blob/2.x/src/main/java/io/reactivex/internal/operators/flowable/FlowableMap.java
>>> 
>> 
>> RxJava has a lot of overhead. Pipes is as bare bones as you can get.
>> 
>> Marko.
>> 
>> http://rredux.com <http://rredux.com/>
>> 
>> 
>> 
>> 
>>> On Apr 4, 2019, at 11:07 AM, Marko Rodriguez <okrammarko@gmail.com>
>> wrote:
>>> 
>>> Hello,
>>> 
>>> Thank you for the response.
>>> 
>>>> Excellent progress on the the RxJava processor. I was wondering if
>>>> categories 1 and 2 can be combined where Pipes becomes the Flowable
>> version
>>>> of the RxJava processor?
>>> 
>>> I don’t quite understand your questions. Are you saying:
>>> 
>>>      Flowable.of().flatMap(pipesProcessor)
>>> 
>>> or are you saying:
>>> 
>>>      “Get rid of Pipes all together and just use single-threaded RxJava
>> instead."
>>> 
>>> For the first, I don’t see the benefit of that. For the second, Pipes4
>> is really fast! — much faster than Pipes3. (more on this next)
>>> 
>>> 
>>>> In this case, though single threaded, we'd still
>>>> get the benefit of asynchronous execution of traversal steps versus
>>>> blocking execution on thread pools like the current TP3 model.
>>> 
>>> Again, I’m confused. Apologies. I believe that perhaps you think that
>> the Step-model of Pipes is what Bytecode gets compiled to in the TP4 VM. If
>> so, note that this is not the case. The concept of Steps (chained
>> iterators) is completely within the pipes/ package. The machine-core/
>> package compiles Bytecode to a nested List of stateless, unconnected
>> functions (called a Compilation). It is this intermediate representation
>> that ultimately is used by Pipes, RxJava, and Beam to create their
>> respective execution plan (where Pipes does the whole chained iterator step
>> thing).
>>> 
>>> Compilation:
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/machine-core/src/main/java/org/apache/tinkerpop/machine/bytecode/compiler/Compilation.java#L43
>>> 
>>> 
>>>      Pipes:
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/pipes/src/main/java/org/apache/tinkerpop/machine/processor/pipes/Pipes.java#L47
>>> 
>>>      Beam:
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/beam/src/main/java/org/apache/tinkerpop/machine/processor/beam/Beam.java#L132
>>> 
>>>      RxJava:
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103
>> <
>> https://github.com/apache/tinkerpop/blob/tp4/java/machine/processor/rxjava/src/main/java/org/apache/tinkerpop/machine/processor/rxjava/RxJava.java#L103
>>> 
>>> 
>>>> I would
>>>> imagine Pipes would beat the Flowable performance on a single traversal
>>>> side-by-side basis (thought perhaps not by much), but the Flowable
>> version
>>>> would likely scale up to higher throughput and better CPU utilization
>> when
>>>> under concurrent load.
>>> 
>>> 
>>> Pipes is definitely faster than RxJava (single-threaded). While I only
>> learned RxJava 36 hours ago, I don’t believe it will ever beat Pipes
>> because Pipes4 is brain dead simple — much simpler than in TP3 where a
>> bunch of extra data structures were needed to account for GraphComputer
>> semantics (e.g. ExpandableIterator).
>>> 
>>> I believe, given the CPU utilization/etc. points you make, that RxJava
>> will come into its own in multi-threaded mode (called ParallelFlowable)
>> when trying to get real-time performance from a query that
>> touches/generates lots of data (traversers). This is the reason for
>> Category 2 — real-time, multi-threaded, single machine. I only gave a quick
>> pass last night at making ParallelFlowable work, but gave up when various
>> test cases were failing (— I now believe I know the reason why). I hope to
>> have ParallelFlowable working by mid-week next week and then we can
>> benchmark its performance.
>>> 
>>> I hope I answered your questions or at least explained my confusion.
>>> 
>>> Thanks,
>>> Marko.
>>> 
>>> http://rredux.com <http://rredux.com/>
>>> 
>>> 
>>> 
>>> 
>>>> On Apr 4, 2019, at 10:33 AM, Ted Wilmes <twilmes@gmail.com <mailto:
>> twilmes@gmail.com>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> 
>>>> --Ted
>>>> 
>>>> On Tue, Apr 2, 2019 at 7:31 AM Marko Rodriguez <okrammarko@gmail.com
>> <mailto:okrammarko@gmail.com>> wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> TP4 will not make a distinction between STANDARD (OLTP) and COMPUTER
>>>>> (OLAP) execution models. In TP4, if a processing engine can convert a
>>>>> bytecode Compilation into a working execution plan then that is all
>> that
>>>>> matters. TinkerPop does not need to concern itself with whether that
>>>>> execution plan is “OLTP" or “OLAP" or with the semantics of its
>> execution
>>>>> (function oriented, iterator oriented, RDD-based, etc.). With that,
>> here
>>>>> are 4 categories of processors that I believe define the full spectrum
>> of
>>>>> what we will be dealing with:
>>>>> 
>>>>>       1. Real-time single-threaded single-machine.
>>>>>               * This is STANDARD (OLTP) in TP3.
>>>>>               * This is the Pipes processor in TP4.
>>>>> 
>>>>>       2. Real-time multi-threaded single-machine.
>>>>>               * This does not exist in TP3.
>>>>>               * We should provide an RxJava processor in TP4.
>>>>> 
>>>>>       3. Near-time distributed multi-machine.
>>>>>               * This does not exist in TP3.
>>>>>               * We should provide an Akka processor in TP4.
>>>>> 
>>>>>       4. Batch-time distributed multi-machine.
>>>>>               * This is COMPUTER (OLAP) in TP3 (Spark or Giraph).
>>>>>               * We should provide a Spark processor in TP4.
>>>>> 
>>>>> I’m not familiar with the specifics of the Flink, Apex, DataFlow,
>> Samza,
>>>>> etc. stream-based processors. However, I believe they can be made to
>> work
>>>>> in near-time or batch-time depending on the amount of data pulled from
>> the
>>>>> database. However, once we understand these technologies better, I
>> believe
>>>>> we should be able to fit them into the categories above.
>>>>> 
>>>>> In conclusion: Do these categories make sense to people?
>> Terminology-wise
>>>>> -- Near-time? Batch-time? Are these distinctions valid?
>>>>> 
>>>>> Thank you,
>>>>> Marko.
>>>>> 
>>>>> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
>> http://rredux.com/>>
>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message