incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: S4-Piper: Scalability in input adapter
Date Thu, 11 Oct 2012 19:04:19 GMT
I am guessing you dont have control the system that provides the video
stream. Unfortunately, the input data has to be partitioned in some
way to scale. If there is only one end point that provides data then
the only way is client(adapter) side filtering. This is not a
limitation because of S4. Its just a design contraint that for a
something to be scalable it needs to be partitioned.



thanks,
Kishore G

On Thu, Oct 11, 2012 at 10:34 AM, Kaiser Md. Nahiduzzaman
<kaisernahid@gmail.com> wrote:
> Hi Kishore,
> Thank you so much for your prompt reply.
>
> Actually, I am able to pull events fast enough for twitter. But I was
> thinking of different applications for example video streams and there
> could be more than one video stream. In that case, if we have only one
> adapter node to process all the video streams then that might be a
> bottleneck. I just asked the input adapter problem on the given
> twitter example to better understand how to scale the input adapters.
>
> "Start multiple adapters, but in each adapter after getting the top
> level status, hash it on userid and filter it accordingly.  For
> example, if you have 2 adapters, each adapter filters 50% of the
> messages based on user id."
> -- Even in this case, each input adapter will get the top level
> status, in case of twitter, only receiving the data is not very large,
> but where the input data itself can be very large, is there anyway to
> distribute the input data itself?
>
> Thanks,
> Kaiser
>
> On Thu, Oct 11, 2012 at 10:14 AM, kishore g <g.kishore@gmail.com> wrote:
>> Hi Kaiser,
>>
>> Can you give more information as to why you need to scale
>> TwitterInputAdapter. Are you not able to pull events fast enough ?.
>>
>> Can you explain how you plan to scale this. The reason i ask this is
>> twitter provides only one stream, it is not partitioned. The way to
>> scale is https://dev.twitter.com/docs/streaming-apis/processing#Scaling.
>> This is pretty much what the twitteradapter is doing, it is simply
>> delegating it AppNodes. So in theory, you should be good with one
>> TwitterInputAdapter. If this does not work, then you can try the
>> following.
>>
>> Start multiple adapters, but in each adapter after getting the top
>> level status, hash it on userid and filter it accordingly.  For
>> example, if you have 2 adapters, each adapter filters 50% of the
>> messages based on user id.
>>
>> If you can give us additional information on what you plan to do and
>> some numbers, we will be able to provide better instructions on how to
>> solve it with s4.
>>
>> thanks,
>> Kishore G
>>
>>
>>
>> On Thu, Oct 11, 2012 at 9:41 AM, Kaiser Md. Nahiduzzaman
>> <kaisernahid@gmail.com> wrote:
>>> Hi,
>>> The S4-piper overview says "Since adapters are also S4 applications,
>>> they can be scaled easily."
>>> I was wondering how to do that. For example, if I create more than one
>>> instances of the HelloInputAdapter, then will the input stream
>>> automatically get divided to the adapter as it does in case of
>>> incoming streams to the multiple HelloApp nodes?
>>> Even if that is possible for HelloInputAdapter, how would you do that
>>> for TwitterInputAdapter i.e how do you provide scalability to
>>> TwitterInputAdapter?
>>>
>>> Thanks in advance,
>>> Kaiser

Mime
View raw message