incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From German Blanco <german.bla...@ericsson.com>
Subject RE: Distributing connections between different adaptor instances
Date Wed, 21 Nov 2012 16:28:14 GMT
Thanks a lot for your reply!

My initial idea was to have an adaptor that listens in a port and opens a different TCP connection
for each client application. I thought that the best way to do this would be with a different
process instance (either adaptor or PE) handling each connection (and each client). The conversion
of the input data to events is trivial, so as you suggest we might do the batching as well.
However, we do control the input stream provider, so if there is no straightforward way to
go for the intial idea, we can try with the Kafka suggestion.

Regards,

German. 

-----Original Message-----
From: Matthieu Morel [mailto:matthieu@yahoo-inc.com] 
Sent: Wednesday, November 21, 2012 12:04 PM
To: s4-user@incubator.apache.org
Subject: Re: Distributing connections between different adaptor instances


On Nov 21, 2012, at 11:37 AM, German Blanco wrote:

> Hello,
> 
> My problem is similar to the one in this thread:
> S4-Piper: Scalability in input adapter Fri, 12 Oct 2012
> 
> The solution proposes to "distribute the connections among adapter nodes".
> Would the distribution be done in the client application that connects to the adaptors?
> Or else, how?

That really depends on your use case, infrastructure, and the kind of preprocessing you need
to do in the adapter.

Usually you would use several adapter nodes because the input stream is big and fast and therefore
you need more processing power to convert it into S4 events in a timely fashion.

If you control the input stream provider:
- If you can "tee" the input traffic - that would be the role of the client app in front of
the adaptor - then it's simple to distribute to various adapter nodes. 
- If you have a pub/sub messaging system (like Kafka) that provides the input stream, you
may configure it to split the stream so that you can fetch different data from different adapters.

If you don't control the input stream provider:
- If you have only 1 input connection but that there is quite some work to do in the adapter
(for instance, enrichment), then you'd benefit listening to the input stream from a single
adapter node but still using several adapter nodes for parallelizing the processing (in keyed
PEs).
- If you have only 1 input connection but that conversion is trivial, and if the input stream
is really big, you might try to do some batching of the data in the listening adapter node,
then parallelize the processing of the batches.


Hope this helps,

Matthieu


Mime
View raw message