spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: spark streaming filestream API
Date Wed, 14 Oct 2015 12:59:50 GMT
Yes, that is correct. When you import the K,V classes, make sure you import
it from the hadoop.io package.

import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.NullWritable;


Thanks
Best Regards

On Wed, Oct 14, 2015 at 6:26 PM, Chandra Mohan, Ananda Vel Murugan <
Ananda.Murugan@honeywell.com> wrote:

> Hi,
>
>
>
> Thanks for your response. My input format is the one I have created to
> handle the files as a whole i.e. WholeFileInputFormat I wrote one based on
> this example
> https://code.google.com/p/hadoop-course/source/browse/HadoopSamples/src/main/java/mr/wholeFile/WholeFileInputFormat.java?r=3
> In this case, key would be Nullwritable and value would be BytesWritable
> right?
>
>
>
> Unfortunately my files are binary and not text files.
>
>
>
> Regards,
>
> Anand.C
>
>
>
> *From:* Akhil Das [mailto:akhil@sigmoidanalytics.com]
> *Sent:* Wednesday, October 14, 2015 5:31 PM
> *To:* Chandra Mohan, Ananda Vel Murugan
> *Cc:* user
> *Subject:* Re: spark streaming filestream API
>
>
>
> Key and Value are the ones that you are using with your InputFormat. Eg:
>
>
>
> JavaReceiverInputDStream<String> lines = jssc.fileStream("/sigmoid",
> LongWritable.class, Text.class, TextInputFormat.class);
>
>
>
> TextInputFormat uses the LongWritable as Key and Text as Value classes. If
> your data is plain CSV or text data then you can use the
> *jssc.textFileStream("/sigmoid")* without worrying about the InputFormat,
> Key and Value classes.
>
>
>
>
>
>
> Thanks
>
> Best Regards
>
>
>
> On Wed, Oct 14, 2015 at 5:12 PM, Chandra Mohan, Ananda Vel Murugan <
> Ananda.Murugan@honeywell.com> wrote:
>
> Hi All,
>
>
>
> I have a directory hdfs which I want to monitor and whenever there is a
> new file in it, I want to parse that file and load the contents into a HIVE
> table. File format is proprietary and I have java parsers for parsing it. I
> am building a spark streaming application for this workflow. For doing
> this, I found JavaStreamingContext.filestream API. It takes four arguments
> directory path, key class, value class and inputformat. What should be
> values of key and value class? Please suggest. Thank you.
>
>
>
>
>
> Regards,
>
> Anand.C
>
>
>

Mime
View raw message