storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander T <mittspamko...@gmail.com>
Subject Re: How does one distribute database iteration across workers?
Date Tue, 19 Apr 2016 06:21:15 GMT
Coreection - group on partition id
On Apr 19, 2016 6:33 AM, "Navin Ipe" <navin.ipe@searchlighthealth.com>
wrote:

> I've seen this:
> http://storm.apache.org/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.html
> but it doesn't explain how workers coordinate with each other, so
> requesting a bit of clarity.
>
> I'm considering a situation where I have 2 million rows in MySQL or
> MongoDB.
>
> 1. I want to use a Spout to read the first 1000 rows and send the
> processed output to a Bolt. This happens in Worker1.
> 2. I want a different instance of the same Spout class to read the next
> 1000 rows in parallel with the working of the Spout of 1, then send the
> processed output to an instance of the same Bolt used in 1. This happens in
> Worker2.
> 3. Same as 1 and 2, but it happens in Worker 3.
> 4. I might setup 10 workers like this.
> 5. When all the Bolts in the workers are finished, they send their outputs
> to a single Bolt in Worker 11.
> 6. The Bolt in Worker 11 writes the processed value to a new MySQL table.
>
> *My confusion here is in how to make the database iterations happen batch
> by batch, parallelly*. Obviously the database connection would have to be
> made in some static class outside the workers, but if workers are started
> with just "conf.setNumWorkers(2);", then how do I tell the workers to
> iterate different rows of the database? Assuming that the workers are
> running in different machines.
>
> --
> Regards,
> Navin
>

Mime
View raw message