spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chandra Mohan, Ananda Vel Murugan" <>
Subject RE: spark streaming filestream API
Date Wed, 14 Oct 2015 12:56:44 GMT

Thanks for your response. My input format is the one I have created to handle the files as
a whole i.e. WholeFileInputFormat I wrote one based on this example
In this case, key would be Nullwritable and value would be BytesWritable right?

Unfortunately my files are binary and not text files.


From: Akhil Das []
Sent: Wednesday, October 14, 2015 5:31 PM
To: Chandra Mohan, Ananda Vel Murugan
Cc: user
Subject: Re: spark streaming filestream API

Key and Value are the ones that you are using with your InputFormat. Eg:

JavaReceiverInputDStream<String> lines = jssc.fileStream("/sigmoid", LongWritable.class,
Text.class, TextInputFormat.class);

TextInputFormat uses the LongWritable as Key and Text as Value classes. If your data is plain
CSV or text data then you can use the jssc.textFileStream("/sigmoid") without worrying about
the InputFormat, Key and Value classes.

Best Regards

On Wed, Oct 14, 2015 at 5:12 PM, Chandra Mohan, Ananda Vel Murugan <<>>
Hi All,

I have a directory hdfs which I want to monitor and whenever there is a new file in it, I
want to parse that file and load the contents into a HIVE table. File format is proprietary
and I have java parsers for parsing it. I am building a spark streaming application for this
workflow. For doing this, I found JavaStreamingContext.filestream API. It takes four arguments
directory path, key class, value class and inputformat. What should be values of key and value
class? Please suggest. Thank you.


View raw message