incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kaiser Md. Nahiduzzaman" <kaiserna...@gmail.com>
Subject Re: S4-Piper: Scalability in input adapter
Date Fri, 12 Oct 2012 19:05:36 GMT
Hi Matthieu,
I think using video stream id, the number of partitions, and the
current partition id will solve the scalability issue in my case.
Thanks for adding S4-102 and thanks for the clarification.

Keep up the good work!
Thanks,
Kaiser

On Fri, Oct 12, 2012 at 4:30 AM, Matthieu Morel <mmorel@apache.org> wrote:
> On 10/11/12 9:52 PM, Kaiser Md. Nahiduzzaman wrote:
>>
>> Thanks again for quick reply. I am becoming more clear about the S4
>> system.
>>
>> Suppose, I have control over the system that provides the video
>> stream. And there are more than one end point that provides data. For
>> example, suppose I have 1000 video cameras with unique VideoStreamID
>> so that I can connect to them individually using that unique ID. How
>> shall I go for such a system on top of S4? Can the multiple camera
>> input of such a system be made scalable using S4 with (some extension
>> probably)? Or, shall I create 1000 clone of the same S4 app each
>> connected to a unique video stream?
>
>
> That's an interesting use case. You'd probably distribute the connections to
> the video streams among adapter nodes, based on the stream id, the number of
> partitions, and the current partition id (I just added S4-102 for accessing
> that information easily). Then you'd forward whatever discrete data you can
> identify downstream to a graph of PEs responsible for converting that data
> into S4 events.
>
>
>
>>
>> I am not sure if S4 is suitable for this kind of video sensors but
>> there is a mention of the use of video sensors in one of the
>> S4-overview slides.
>
>
> The key is to be able to discretize the video stream. (maybe through
> sampling?). If this is not possible, then an event processing platform like
> S4 is probably not suitable.
>
>
>> And S4 adapter is scalable which is not
>> demonstrated in the twitter example.
>
>
> Indeed, the twitter example is simplistic, and in order to scale, you would:
> distribute input stream connections across adapter nodes, and define a graph
> of PEs (that can be distributed).
>
> Hope this helps,
>
> Matthieu
>
>
>>
>> Sorry, for not making my case clear in the first place.
>> Thanks,
>> Kaiser
>>
>> On Thu, Oct 11, 2012 at 12:04 PM, kishore g <g.kishore@gmail.com> wrote:
>>>
>>> I am guessing you dont have control the system that provides the video
>>> stream. Unfortunately, the input data has to be partitioned in some
>>> way to scale. If there is only one end point that provides data then
>>> the only way is client(adapter) side filtering. This is not a
>>> limitation because of S4. Its just a design contraint that for a
>>> something to be scalable it needs to be partitioned.
>>>
>>>
>>>
>>> thanks,
>>> Kishore G
>>>
>>> On Thu, Oct 11, 2012 at 10:34 AM, Kaiser Md. Nahiduzzaman
>>> <kaisernahid@gmail.com> wrote:
>>>>
>>>> Hi Kishore,
>>>> Thank you so much for your prompt reply.
>>>>
>>>> Actually, I am able to pull events fast enough for twitter. But I was
>>>> thinking of different applications for example video streams and there
>>>> could be more than one video stream. In that case, if we have only one
>>>> adapter node to process all the video streams then that might be a
>>>> bottleneck. I just asked the input adapter problem on the given
>>>> twitter example to better understand how to scale the input adapters.
>>>>
>>>> "Start multiple adapters, but in each adapter after getting the top
>>>> level status, hash it on userid and filter it accordingly.  For
>>>> example, if you have 2 adapters, each adapter filters 50% of the
>>>> messages based on user id."
>>>> -- Even in this case, each input adapter will get the top level
>>>> status, in case of twitter, only receiving the data is not very large,
>>>> but where the input data itself can be very large, is there anyway to
>>>> distribute the input data itself?
>>>>
>>>> Thanks,
>>>> Kaiser
>>>>
>>>> On Thu, Oct 11, 2012 at 10:14 AM, kishore g <g.kishore@gmail.com> wrote:
>>>>>
>>>>> Hi Kaiser,
>>>>>
>>>>> Can you give more information as to why you need to scale
>>>>> TwitterInputAdapter. Are you not able to pull events fast enough ?.
>>>>>
>>>>> Can you explain how you plan to scale this. The reason i ask this is
>>>>> twitter provides only one stream, it is not partitioned. The way to
>>>>> scale is
>>>>> https://dev.twitter.com/docs/streaming-apis/processing#Scaling.
>>>>> This is pretty much what the twitteradapter is doing, it is simply
>>>>> delegating it AppNodes. So in theory, you should be good with one
>>>>> TwitterInputAdapter. If this does not work, then you can try the
>>>>> following.
>>>>>
>>>>> Start multiple adapters, but in each adapter after getting the top
>>>>> level status, hash it on userid and filter it accordingly.  For
>>>>> example, if you have 2 adapters, each adapter filters 50% of the
>>>>> messages based on user id.
>>>>>
>>>>> If you can give us additional information on what you plan to do and
>>>>> some numbers, we will be able to provide better instructions on how to
>>>>> solve it with s4.
>>>>>
>>>>> thanks,
>>>>> Kishore G
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 11, 2012 at 9:41 AM, Kaiser Md. Nahiduzzaman
>>>>> <kaisernahid@gmail.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>> The S4-piper overview says "Since adapters are also S4 applications,
>>>>>> they can be scaled easily."
>>>>>> I was wondering how to do that. For example, if I create more than
one
>>>>>> instances of the HelloInputAdapter, then will the input stream
>>>>>> automatically get divided to the adapter as it does in case of
>>>>>> incoming streams to the multiple HelloApp nodes?
>>>>>> Even if that is possible for HelloInputAdapter, how would you do
that
>>>>>> for TwitterInputAdapter i.e how do you provide scalability to
>>>>>> TwitterInputAdapter?
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Kaiser
>
>

Mime
View raw message