spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Sharma <>
Subject Re: Reading back hdfs files saved as case class
Date Fri, 07 Oct 2016 19:30:40 GMT
Thanks for the answer Reynold.
Yes I can use the dataset but it will solve the purpose I am supposed to
use it for.
I am trying to work on a solution where I need to save the case class along
with data in hdfs.
Further this data will move to different folders corresponding to different
case classes .
The spark programs reading these files are supposed to apply the case class
directly depending on the folder they are reading from.


On Oct 8, 2016 00:53, "Reynold Xin" <> wrote:

> You can use the Dataset API -- it should solve this issue for case classes
> that are not very complex.
> On Fri, Oct 7, 2016 at 12:20 PM, Deepak Sharma <>
> wrote:
>> Hi
>> I am saving RDD[Example] in hdfs from spark program , where Example is
>> case class.
>> Now when i am trying to read it back , it returns RDD[String] with the
>> content as below:
>> *Example(1,name,value)*
>> The workaround can be to write as a string in hdfs and read it back as
>> string and perform further processing.This way the case class name wouldn't
>> appear at all in the file being written in hdfs.
>> But i am keen to know if we can read the data directly in Spark if the
>> RDD[Case_Class] is written to hdfs?
>> --
>> Thanks
>> Deepak

View raw message