Thanks for your response. My input format is the one I have created to handle the files as a whole i.e. WholeFileInputFormat I wrote one based on this example https://code.google.com/p/hadoop-course/source/browse/HadoopSamples/src/main/java/mr/wholeFile/WholeFileInputFormat.java?r=3 In this case, key would be Nullwritable and value would be BytesWritable right?
Unfortunately my files are binary and not text files.
Key and Value are the ones that you are using with your InputFormat. Eg:
JavaReceiverInputDStream<String> lines = jssc.fileStream("/sigmoid", LongWritable.class, Text.class, TextInputFormat.class);
TextInputFormat uses the LongWritable as Key and Text as Value classes. If your data is plain CSV or text data then you can use the jssc.textFileStream("/sigmoid") without worrying about the InputFormat, Key and Value classes.
On Wed, Oct 14, 2015 at 5:12 PM, Chandra Mohan, Ananda Vel Murugan <Ananda.Murugan@honeywell.com> wrote:
I have a directory hdfs which I want to monitor and whenever there is a new file in it, I want to parse that file and load the contents into a HIVE table. File format is proprietary and I have java parsers for parsing it. I am building a spark streaming application for this workflow. For doing this, I found JavaStreamingContext.filestream API. It takes four arguments directory path, key class, value class and inputformat. What should be values of key and value class? Please suggest. Thank you.