spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shyla deshpande <deshpandesh...@gmail.com>
Subject Re: Does Data pipeline using kafka and structured streaming work?
Date Wed, 02 Nov 2016 03:59:42 GMT
Thanks Michael and Cody. Appreciate your help.
-Shyla

On Tue, Nov 1, 2016 at 6:52 PM, Cody Koeninger <cody@koeninger.org> wrote:

> One thing you should be aware of (that's a showstopper for my use
> cases, but may not be for yours) is that you can provide Kafka offsets
> to start from, but you can't really get access to offsets and metadata
> during the job on a per-batch or per-partition basis, just on a
> per-message basis.
>
> On Tue, Nov 1, 2016 at 8:29 PM, Michael Armbrust <michael@databricks.com>
> wrote:
> > Yeah, those are all requests for additional features / version support.
> > I've been using kafka with structured streaming to do both ETL into
> > partitioned parquet tables as well as streaming event time windowed
> > aggregation for several weeks now.
> >
> > On Tue, Nov 1, 2016 at 6:18 PM, Cody Koeninger <cody@koeninger.org>
> wrote:
> >>
> >> Look at the resolved subtasks attached to that ticket you linked.
> >> Some of them are unresolved, but basic functionality is there.
> >>
> >> On Tue, Nov 1, 2016 at 7:37 PM, shyla deshpande
> >> <deshpandeshyla@gmail.com> wrote:
> >> > Hi Michael,
> >> >
> >> > Thanks for the reply.
> >> >
> >> > The following link says there is a open unresolved Jira for Structured
> >> > streaming support for consuming from Kafka.
> >> >
> >> > https://issues.apache.org/jira/browse/SPARK-15406
> >> >
> >> > Appreciate your help.
> >> >
> >> > -Shyla
> >> >
> >> >
> >> > On Tue, Nov 1, 2016 at 5:19 PM, Michael Armbrust
> >> > <michael@databricks.com>
> >> > wrote:
> >> >>
> >> >> I'm not aware of any open issues against the kafka source for
> >> >> structured
> >> >> streaming.
> >> >>
> >> >> On Tue, Nov 1, 2016 at 4:45 PM, shyla deshpande
> >> >> <deshpandeshyla@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> I am building a data pipeline using Kafka, Spark streaming and
> >> >>> Cassandra.
> >> >>> Wondering if the issues with  Kafka source fixed in Spark 2.0.1.
If
> >> >>> not,
> >> >>> please give me an update on when it may be fixed.
> >> >>>
> >> >>> Thanks
> >> >>> -Shyla
> >> >>
> >> >>
> >> >
> >
> >
>

Mime
View raw message