spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Piu <sebastian....@gmail.com>
Subject Re: Reading parquet files into Spark Streaming
Date Sat, 27 Aug 2016 18:18:53 GMT
Hi Renato,

Check here on how to do it, it is in Java but you can translate it to Scala
if that is what you need.

Cheers

On Sat, 27 Aug 2016, 14:24 Renato Marroquín Mogrovejo, <
renatoj.marroquin@gmail.com> wrote:

> Hi Akhilesh,
>
> Thanks for your response.
> I am using Spark 1.6.1 and what I am trying to do is to ingest parquet
> files into the Spark Streaming, not in batch operations.
>
>     val ssc = new StreamingContext(sc, Seconds(5))
>     ssc.sparkContext.hadoopConfiguration.set("parquet.read.support.class",
> "parquet.avro.AvroReadSupport")
>
>     val sqlContext = new SQLContext(sc)
>
>     import sqlContext.implicits._
>
>     val oDStream = ssc.fileStream[Void, Order,
> ParquetInputFormat]("TempData/origin/")
>
>     oDStream.foreachRDD(relation => {
>       if (relation.count() == 0)
>         println("Nothing received")
>       else {
>         val rDF = relation.toDF().as[Order]
>         println(rDF.first())
>       }
>     })
>
> But that doesn't work. Any ideas?
>
>
> Best,
>
> Renato M.
>
> 2016-08-27 9:01 GMT+02:00 Akhilesh Pathodia <pathodia.akhilesh@gmail.com>:
>
>> Hi Renato,
>>
>> Which version of Spark are you using?
>>
>> If spark version is 1.3.0 or more then you can use SqlContext to read the
>> parquet file which will give you DataFrame. Please follow the below link:
>>
>>
>> https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#loading-data-programmatically
>>
>> Thanks,
>> Akhilesh
>>
>> On Sat, Aug 27, 2016 at 3:26 AM, Renato Marroquín Mogrovejo <
>> renatoj.marroquin@gmail.com> wrote:
>>
>>> Anybody? I think Rory also didn't get an answer from the list ...
>>>
>>>
>>> https://mail-archives.apache.org/mod_mbox/spark-user/201602.mbox/%3CCAC+fRE14PV5nvQHTBVqDC+6DkXo73oDAzfqsLbSo8F94ozO5nQ@mail.gmail.com%3E
>>>
>>>
>>>
>>> 2016-08-26 17:42 GMT+02:00 Renato Marroquín Mogrovejo <
>>> renatoj.marroquin@gmail.com>:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to use parquet files as input for DStream operations, but I
>>>> can't find any documentation or example. The only thing I found was [1] but
>>>> I also get the same error as in the post (Class
>>>> parquet.avro.AvroReadSupport not found).
>>>> Ideally I would like to do have something like this:
>>>>
>>>> val oDStream = ssc.fileStream[Void, Order,
>>>> ParquetInputFormat[Order]]("data/")
>>>>
>>>> where Order is a case class and the files inside "data" are all parquet
>>>> files.
>>>> Any hints would be highly appreciated. Thanks!
>>>>
>>>>
>>>> Best,
>>>>
>>>> Renato M.
>>>>
>>>> [1]
>>>> http://stackoverflow.com/questions/35413552/how-do-i-read-in-parquet-files-using-ssc-filestream-and-what-is-the-nature
>>>>
>>>
>>>
>>
>

Mime
View raw message