incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <matth...@yahoo-inc.com>
Subject Re: Distributing connections between different adaptor instances
Date Wed, 21 Nov 2012 11:04:07 GMT

On Nov 21, 2012, at 11:37 AM, German Blanco wrote:

> Hello,
> 
> My problem is similar to the one in this thread:
> S4-Piper: Scalability in input adapter Fri, 12 Oct 2012
> 
> The solution proposes to "distribute the connections among adapter nodes".
> Would the distribution be done in the client application that connects to the adaptors?
> Or else, how?

That really depends on your use case, infrastructure, and the kind of preprocessing you need
to do in the adapter.

Usually you would use several adapter nodes because the input stream is big and fast and therefore
you need more processing power to convert it into S4 events in a timely fashion.

If you control the input stream provider:
- If you can "tee" the input traffic - that would be the role of the client app in front of
the adaptor - then it's simple to distribute to various adapter nodes. 
- If you have a pub/sub messaging system (like Kafka) that provides the input stream, you
may configure it to split the stream so that you can fetch different data from different adapters.

If you don't control the input stream provider:
- If you have only 1 input connection but that there is quite some work to do in the adapter
(for instance, enrichment), then you'd benefit listening to the input stream from a single
adapter node but still using several adapter nodes for parallelizing the processing (in keyed
PEs).
- If you have only 1 input connection but that conversion is trivial, and if the input stream
is really big, you might try to do some batching of the data in the listening adapter node,
then parallelize the processing of the batches.


Hope this helps,

Matthieu


Mime
View raw message