storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Jackson <>
Subject Re: Trident batching under the hood?
Date Thu, 03 Apr 2014 19:06:58 GMT
There's no distributed barrier at repartitioning the stream. A batch goes
through 'phases': batch phase (perform computation), commit phase (update
persistent state), finish phase. The only distributed barriers are between
each batch phase and each batch, Here are barriers between batch phases
denoted by Xs: e.g. X1 batch X2 commit X3 finish. Within a particular
trident batch during the batch phase, the streaming computation and
repartitioning is done very much like how storm does it, e.g. a 1:1 ratio
of Trident Tuple to Storm Tuples.

(Distributed Barriers are very expensive and add latency that could cause
your system to not be real-time)

On Thu, Apr 3, 2014 at 11:54 AM, Dong Mo <> wrote:

> Dear list,
>  I am trying to understand trident's batching process more.
> I understand that trident's spout takes input by batches.
> My question is will the notion of batches still maintained during
> execution of topology.
> For example, I have this trident topology
> Spout(batched stream) ---- FunctionA(operate on the batch) ----
> PartionByFeild(involve network transfer due to repartitioning) ----
> FunctionB(on a new batch or a stream of tuple?)
> Function A take batched input from spout to do some mapping for example.
> So will PartionByField only execute when FunctionA finished processing on
> the whole batch of input? Or is it the case that as functionA map on each
> tuple and emit it to the corresponding next stage by field like a fluid?
> That is, does trident internal processing logical perform discretely like
> the way it takes in batches or it falls back to tuple-by-tuple fluid model?
> Is it possible to reason about "barriers" in trident's internal processing?
> Thanks
> -Mo

View raw message