samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <jagadish1...@gmail.com>
Subject Re: [DISCUSS] End of Stream Feature in Samza
Date Sat, 03 Sep 2016 00:22:03 GMT
Thank you Renato for the insightful feedback! :-)

I don't think we have a complete scoping and design for the "punctuated
streams" feature as of yet. However, one way to implement it would be to
leverage something similar to what this feature does. We could have a
special offset for a "punctuation message". In that case, the end-of-stream
is just a special case of punctuation message. A punctuation could
conceivably be implemented with a special offset (or even as a sub-class of
IncomingMessageEnvelope).

On Thu, Sep 1, 2016 at 5:04 AM, Renato MarroquĂ­n Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Thanks Jagadish! This is great!
> Can you share some thoughts/opinions on how this feature relates on using
> punctuations (at some point) in Samza? I mean do you think that using
> punctuated streams could be seen as a generalization of this problem? And
> if so, could this feature be used later on as a building block for
> implementing punctuations into Samza?
>
>
> Best,
>
> Renato M.
>
> 2016-09-01 1:37 GMT+02:00 Jagadish Venkatraman <jagadish1989@gmail.com>:
>
> > Thanks for reviewing and the comments. Please find my replies inline.
> >
> >
> >
> >
> >
> >
> >
> >
> > *"The offset in the partition that the message was received from. Ifthis
> is
> > the last message in the SSP, this field is set to END_OF_STREAM.Such a
> > message is not delivered to the actual StreamTask implementation."Does
> this
> > mean the last message is not delivered to the Task? Does the
> sourceprovide
> > that info? If that is the case, then it kind of creates contractbetween
> any
> > bounded system consumer and samza. Or did you mean to say thatwe assume
> > end-of-stream has been reached when there is no message returnedon poll?
> *
> >
> >
> > >> It means that the message is not delivered to the "*StreamTask
> > implementation*" ie. user code. Yes, we are creating a contract between
> > SystemConsumer and Samza - " the SystemConsumer implementation will
> > generate an IncomingMessageEnvelope by invoking buildEndOfStream()"
> >
> >
> >
> > *" Please do clarify. I think what is missing in this document ishow to
> > "detect" an end-of-stream from the source."*
> >
> > >>"Detecting" end of stream from the source should be simply parsing the
> > offset from the message that the consumer returns during poll() and
> > checking if it is end-of-stream. I'm happy to add this to the document.
> >
> > 2. Does this design preclude the possibility of consuming bounded and
> > unbounded stream partitions in the task ?
> >
> > >> Nope, It does not. However, the container would not terminate (if you
> > have a source that has not reached end of stream yet)
> >
> > 3. During checkpoint, let's say some of the partitions have reached EOF.
> Do
> > we write a special offset in the checkpoint message that indicates that
> it
> > has reached end of stream and don't need to poll anymore?
> >
> > >> No, we do not. The dance in the AsyncRunLoop state's machine / flow
> > control in https://reviews.apache.org/r/51346/ guarantees that
> > end-of-stream SSPs are not polled any more.
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Aug 31, 2016 at 2:01 PM, Navina Ramesh
> > <nramesh@linkedin.com.invalid
> > > wrote:
> >
> > > Hi Jagadish,
> > > Thanks for sharing the design with the community. I have a couple of
> > > questions that were not very clear from the design document.
> > >
> > > 1. Under mechanism for indicating the end-of-stream to Samza, you
> mention
> > > "The offset in the partition that the message was received from. If
> > > this is the last message in the SSP, this field is set to
> END_OF_STREAM.
> > > Such a message is not delivered to the actual StreamTask
> implementation."
> > > Does this mean the last message is not delivered to the Task? How do
> you
> > > identify that it is indeed the last message in the SSP? Does the source
> > > provide that info? If that is the case, then it kind of creates
> contract
> > > between any bounded system consumer and samza. Or did you mean to say
> > that
> > > we assume end-of-stream has been reached when there is no message
> > returned
> > > on poll?  Please do clarify. I think what is missing in this document
> is
> > > how to "detect" an end-of-stream from the source.
> > >
> > > 2. Does this design preclude the possibility of consuming bounded and
> > > unbounded stream partitions in the task ?
> > >
> > > 3. During checkpoint, let's say some of the partitions have reached
> EOF.
> > Do
> > > we write a special offset in the checkpoint message that indicates that
> > it
> > > has reached end of stream and don't need to poll anymore?
> > >
> > > Thanks!
> > > Navina
> > >
> > > On Tue, Aug 30, 2016 at 4:50 PM, Julian Hyde <jhyde@apache.org> wrote:
> > >
> > > >
> > > > > On Aug 30, 2016, at 4:44 PM, Xinyu Liu <xinyuliu.us@gmail.com>
> > wrote:
> > > > >
> > > > > It's very exciting that Samza is adding support of bounded input
> > > streams.
> > > >
> > > > +1!
> > > >
> > > >
> > >
> > >
> > > --
> > > Navina R.
> > >
> >
> >
> >
> > --
> > Jagadish V,
> > Graduate Student,
> > Department of Computer Science,
> > Stanford University
> >
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message