spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antsy.Rao" <ant....@gmail.com>
Subject Re: Does anyone have experience with using Hadoop InputFormats?
Date Sat, 01 Aug 2015 16:19:00 GMT


Sent from my iPad

On 2014-9-24, at 上午8:13, Steve Lewis <lordjoe2000@gmail.com> wrote:

>  When I experimented with using an InputFormat I had used in Hadoop for a long time in
Hadoop I found
> 1) it must extend org.apache.hadoop.mapred.FileInputFormat (the deprecated class not
org.apache.hadoop.mapreduce.lib.input;FileInputFormat
> 2) initialize needs to be called in the constructor
> 3) The type - mine was extends FileInputFormat<Text, Text> must not be a Hadoop
Writable - those are not serializable but extends FileInputFormat<StringBuffer, StringBuffer>
does work - I don't think this is allowed in Hadoop 
> 
> Are these statements correct and if so it seems like most Hadoop InputFormate - certainly
the custom ones I create require serious modifications to work - does anyone have samples
of use of Hadoop InputFormat 
> 
> Since I am working with problems where a directory with multiple files are processed
and some files are many gigabytes in size with multiline complex records an input format is
a requirement.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message