spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shyla deshpande <deshpandesh...@gmail.com>
Subject Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?
Date Tue, 08 Aug 2017 04:37:18 GMT
Thanks TD.

On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das <tathagata.das1565@gmail.com>
wrote:

> I dont think there is any easier way.
>
> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande <deshpandeshyla@gmail.com>
> wrote:
>
>> Thanks TD for the response. I forgot to mention that I am not using
>> structured streaming.
>>
>> I was looking into KafkaUtils.createRDD, and looks like I need to get
>> the earliest and the latest offset for each partition to build the
>> Array(offsetRange). I wanted to know if there was a easier way.
>>
>> 1 reason why we are hesitating to use structured streaming is because I
>> need to persist the data in Cassandra database which I believe is not
>> production ready.
>>
>>
>> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das <
>> tathagata.das1565@gmail.com> wrote:
>>
>>> Its best to use DataFrames. You can read from as streaming or as batch.
>>> More details here.
>>>
>>> https://spark.apache.org/docs/latest/structured-streaming-ka
>>> fka-integration.html#creating-a-kafka-source-for-batch-queries
>>> https://databricks.com/blog/2017/04/26/processing-data-in-ap
>>> ache-kafka-with-structured-streaming-in-apache-spark-2-2.html
>>>
>>> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande <
>>> deshpandeshyla@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> What is the easiest way to read all the data from kafka in a batch
>>>> program for a given topic?
>>>> I have 10 kafka partitions, but the data is not much. I would like to
>>>> read  from the earliest from all the partitions for a topic.
>>>>
>>>> I appreciate any help. Thanks
>>>>
>>>
>>>
>>
>

Mime
View raw message