spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: [Beginner] How to save Kafka Dstream data to parquet ?
Date Wed, 28 Feb 2018 21:59:24 GMT
There is no good way to save to parquet without causing downstream
consistency issues.
You could use foreachRDD to get each RDD, convert it to DataFrame/Dataset,
and write out as parquet files. But you will later run into issues with
partial files caused by failures, etc.


On Wed, Feb 28, 2018 at 11:09 AM, karthikus <aswin88us@gmail.com> wrote:

> Hi all,
>
> I have a Kafka stream data and I need to save the data in parquet format
> without using Structured Streaming (due to the lack of Kafka Message header
> support).
>
> val kafkaStream =
>       KafkaUtils.createDirectStream(
>         streamingContext,
>         LocationStrategies.PreferConsistent,
>         ConsumerStrategies.Subscribe[String, String](
>           topics,
>           kafkaParams
>         )
>       )
>     // process the messages
>     val messages = kafkaStream.map(record => (record.key, record.value))
>     val lines = messages.map(_._2)
>
> Now, how do I save it as parquet ? All the examples that I have come across
> uses SQLContext which is deprecated. ! Any help appreciated !
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message