spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Antsy.Rao" <>
Subject Re: Does anyone have experience with using Hadoop InputFormats?
Date Sat, 01 Aug 2015 16:19:00 GMT

Sent from my iPad

On 2014-9-24, at 上午8:13, Steve Lewis <> wrote:

>  When I experimented with using an InputFormat I had used in Hadoop for a long time in
Hadoop I found
> 1) it must extend org.apache.hadoop.mapred.FileInputFormat (the deprecated class not
> 2) initialize needs to be called in the constructor
> 3) The type - mine was extends FileInputFormat<Text, Text> must not be a Hadoop
Writable - those are not serializable but extends FileInputFormat<StringBuffer, StringBuffer>
does work - I don't think this is allowed in Hadoop 
> Are these statements correct and if so it seems like most Hadoop InputFormate - certainly
the custom ones I create require serious modifications to work - does anyone have samples
of use of Hadoop InputFormat 
> Since I am working with problems where a directory with multiple files are processed
and some files are many gigabytes in size with multiline complex records an input format is
a requirement.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message