spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com>
Subject Re: Reading parquet files into Spark Streaming
Date Sat, 27 Aug 2016 13:24:40 GMT
Hi Akhilesh,

Thanks for your response.
I am using Spark 1.6.1 and what I am trying to do is to ingest parquet
files into the Spark Streaming, not in batch operations.

    val ssc = new StreamingContext(sc, Seconds(5))
    ssc.sparkContext.hadoopConfiguration.set("parquet.read.support.class",
"parquet.avro.AvroReadSupport")

    val sqlContext = new SQLContext(sc)

    import sqlContext.implicits._

    val oDStream = ssc.fileStream[Void, Order,
ParquetInputFormat]("TempData/origin/")

    oDStream.foreachRDD(relation => {
      if (relation.count() == 0)
        println("Nothing received")
      else {
        val rDF = relation.toDF().as[Order]
        println(rDF.first())
      }
    })

But that doesn't work. Any ideas?


Best,

Renato M.

2016-08-27 9:01 GMT+02:00 Akhilesh Pathodia <pathodia.akhilesh@gmail.com>:

> Hi Renato,
>
> Which version of Spark are you using?
>
> If spark version is 1.3.0 or more then you can use SqlContext to read the
> parquet file which will give you DataFrame. Please follow the below link:
>
> https://spark.apache.org/docs/1.5.0/sql-programming-guide.
> html#loading-data-programmatically
>
> Thanks,
> Akhilesh
>
> On Sat, Aug 27, 2016 at 3:26 AM, Renato Marroquín Mogrovejo <
> renatoj.marroquin@gmail.com> wrote:
>
>> Anybody? I think Rory also didn't get an answer from the list ...
>>
>> https://mail-archives.apache.org/mod_mbox/spark-user/201602.
>> mbox/%3CCAC+fRE14PV5nvQHTBVqDC+6DkXo73oDAzfqsLbSo8F94ozO5nQ@
>> mail.gmail.com%3E
>>
>>
>>
>> 2016-08-26 17:42 GMT+02:00 Renato Marroquín Mogrovejo <
>> renatoj.marroquin@gmail.com>:
>>
>>> Hi all,
>>>
>>> I am trying to use parquet files as input for DStream operations, but I
>>> can't find any documentation or example. The only thing I found was [1] but
>>> I also get the same error as in the post (Class
>>> parquet.avro.AvroReadSupport not found).
>>> Ideally I would like to do have something like this:
>>>
>>> val oDStream = ssc.fileStream[Void, Order, ParquetInputFormat[Order]]("da
>>> ta/")
>>>
>>> where Order is a case class and the files inside "data" are all parquet
>>> files.
>>> Any hints would be highly appreciated. Thanks!
>>>
>>>
>>> Best,
>>>
>>> Renato M.
>>>
>>> [1] http://stackoverflow.com/questions/35413552/how-do-i-read-in
>>> -parquet-files-using-ssc-filestream-and-what-is-the-nature
>>>
>>
>>
>

Mime
View raw message