storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonas Krauß <jkraus...@gmail.com>
Subject Switching from regular storm topology to trident
Date Fri, 11 Aug 2017 07:12:11 GMT
Hi guys,

I am following up my earlier questions asked under subject "DRPC and Trident Throughput".
If you do not like to know the background just skip the next paragraph and go right to my
issue.

// background story
So far my experience with trident has unfortunately been disappointing. I started out with
the idea of have a trident stream which takes in tuples from other topologies via DRPC. DRPC
support was the reason to switch from a regular storm topology to trident (so far we were
doing fine with the regular topologies). However, seemingly there is no real documentation
available on 1. how DRPC and trident work together 2. what DRPC configuration options mean,
e. g. drpc.queue.size (to me it is even unclear whether this is a topology scope variable
or one from the DRPC server) 3. how trident decides on coordinating its batches. I have read
the trident docs, tutorials and faq at storm.apache.org, these three points remain in the
shade. I am not able to achieve a throughput of more than 10k tuples per 10 minutes via trident
(in the best case), most of the times and when there is congestion throughput will tank severely
to a point that tuples start failing. I have configured the trident parallelism and Xmx to
a point exceeding our previous regular topology resources by 4, so I would expect at least
to see the same performance. I am now at a point where I consider to switch back to a regular
topology and bury the DRPC idea.
// end of background story

I am trying to figure out how to increase throughput in my trident topology, the issue seem
to lie in the spout (extension of an IBatchSpout, emitting max 50 tuples per batch, batch
interval set to 100ms, parallelism set to one). Can I make some fields of the spout static,
e. g. the queue holding tuples in place until the next batch commences, a jedis connector
which is subscribed to a stream and then increase parallelism of the spout to increase the
throughput? How are batches coordinated between parallel spouts, do they emit things in parallel?
Is there an example of a trident topology in the storm git repository which receives new tuples
in arbitrary manner?

Thanks

Jonas
Mime
View raw message