spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shyla deshpande <deshpandesh...@gmail.com>
Subject Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?
Date Thu, 10 Aug 2017 18:54:48 GMT
Thanks Cody.

On Wed, Aug 9, 2017 at 8:46 AM, Cody Koeninger <cody@koeninger.org> wrote:

> org.apache.spark.streaming.kafka.KafkaCluster has methods
> getLatestLeaderOffsets and getEarliestLeaderOffsets
>
> On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande
> <deshpandeshyla@gmail.com> wrote:
> > Thanks TD.
> >
> > On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das <
> tathagata.das1565@gmail.com>
> > wrote:
> >>
> >> I dont think there is any easier way.
> >>
> >> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande <
> deshpandeshyla@gmail.com>
> >> wrote:
> >>>
> >>> Thanks TD for the response. I forgot to mention that I am not using
> >>> structured streaming.
> >>>
> >>> I was looking into KafkaUtils.createRDD, and looks like I need to get
> the
> >>> earliest and the latest offset for each partition to build the
> >>> Array(offsetRange). I wanted to know if there was a easier way.
> >>>
> >>> 1 reason why we are hesitating to use structured streaming is because I
> >>> need to persist the data in Cassandra database which I believe is not
> >>> production ready.
> >>>
> >>>
> >>> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das
> >>> <tathagata.das1565@gmail.com> wrote:
> >>>>
> >>>> Its best to use DataFrames. You can read from as streaming or as
> batch.
> >>>> More details here.
> >>>>
> >>>>
> >>>> https://spark.apache.org/docs/latest/structured-streaming-
> kafka-integration.html#creating-a-kafka-source-for-batch-queries
> >>>>
> >>>> https://databricks.com/blog/2017/04/26/processing-data-in-
> apache-kafka-with-structured-streaming-in-apache-spark-2-2.html
> >>>>
> >>>> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande
> >>>> <deshpandeshyla@gmail.com> wrote:
> >>>>>
> >>>>> Hi all,
> >>>>>
> >>>>> What is the easiest way to read all the data from kafka in a batch
> >>>>> program for a given topic?
> >>>>> I have 10 kafka partitions, but the data is not much. I would like
to
> >>>>> read  from the earliest from all the partitions for a topic.
> >>>>>
> >>>>> I appreciate any help. Thanks
> >>>>
> >>>>
> >>>
> >>
> >
>

Mime
View raw message