storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cody A. Ray" <>
Subject Re: Best practice to persist data in multiple TridentState
Date Thu, 15 May 2014 15:34:25 GMT
I don't know what the "best practice" is... but I actually like a 4th
option: creating a composite state.

Instead of sending all data to every state, I needed to randomly shard data
between an arbitrary number of states. I've thrown this on a gist here:

You could probably take a similar approach with a CompositeState that would
send the data to all TridentStates instead of randomly choosing a state.

Good luck!


On Fri, May 2, 2014 at 3:12 AM, Laurent Thoulon <> wrote:

> Hi,
> What would you say is the best way to persist data to multiple states ?
> Currently i have 3 options in mind:
> 1- Process data and use the stream to send data to both state
> Stream stream = ...each...filter...bla....
> stream.partitionPersist(state1, ...)
> stream.partitionPersist(state2, ...)
> 2- Process data and chain the persists
> Stream stream = ...each...filter...bla....
> stream.partitionPersist(state1,
> ...).newValuesStream().partitionPersist(state2, ...)
> 3- Do a topology for each state which would all mostly does the same thing
> but for the persist part.
> My main concerns here is handling failures and efficiency.
> In my usecase i actually have 3 states. 2 of them can store in a non
> transactionnal way and the other should be opaque transactionnal but
> actually can't as it's just an api call that doesn't recognize duplicates.
> That's no big deal if we could just make sure it's not bound to the
> failures of the other states (meaning that if an other state fails we're
> sure this one hasn't yet processed data).
> This makes option n°1 a bit tricky as i'm never sure of the order in which
> the state will be processed. Or is there a way to be sure ?
> Option 2 would do i guess but i have to pass allong in the first state all
> the data needed for the second. Potentially i would like to filter the
> tuples that goes to state 1 or state 2. I would then have to make my own
> updater that uses a filter for the first persists so that it doesn't send
> everything to the state but still emits everything in the end.
> Options 3 would also do but there i wouldn't be that efficient: reading my
> spout two times, processing data the same way in both topology up until the
> persist part.
> Any ideas on the best way to handle this ?
> Thanks
> Regards
> Laurent

Cody A. Ray, LEED AP

View raw message