spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Configuring custom input format
Date Wed, 26 Nov 2014 02:07:32 GMT
Yeah, unfortunately that will be up to them to fix, though it wouldn't hurt to send them a
JIRA mentioning this.

Matei

> On Nov 25, 2014, at 2:58 PM, Corey Nolet <cjnolet@gmail.com> wrote:
> 
> I was wiring up my job in the shell while i was learning Spark/Scala. I'm getting more
comfortable with them both now so I've been mostly testing through Intellij with mock data
as inputs.
> 
> I think the problem lies more on Hadoop than Spark as the Job object seems to check it's
state and throw an exception when the toString() method is called before the Job has physically
been submitted.
> 
> On Tue, Nov 25, 2014 at 5:31 PM, Matei Zaharia <matei.zaharia@gmail.com <mailto:matei.zaharia@gmail.com>>
wrote:
> How are you creating the object in your Scala shell? Maybe you can write a function that
directly returns the RDD, without assigning the object to a temporary variable.
> 
> Matei
> 
>> On Nov 5, 2014, at 2:54 PM, Corey Nolet <cjnolet@gmail.com <mailto:cjnolet@gmail.com>>
wrote:
>> 
>> The closer I look @ the stack trace in the Scala shell, it appears to be the call
to toString() that is causing the construction of the Job object to fail. Is there a ways
to suppress this output since it appears to be hindering my ability to new up this object?
>> 
>> On Wed, Nov 5, 2014 at 5:49 PM, Corey Nolet <cjnolet@gmail.com <mailto:cjnolet@gmail.com>>
wrote:
>> I'm trying to use a custom input format with SparkContext.newAPIHadoopRDD. Creating
the new RDD works fine but setting up the configuration file via the static methods on input
formats that require a Hadoop Job object is proving to be difficult. 
>> 
>> Trying to new up my own Job object with the SparkContext.hadoopConfiguration is throwing
the exception on line 283 of this grepcode:
>> 
>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-core/2.5.0/org/apache/hadoop/mapreduce/Job.java#Job>
>> 
>> Looking in the SparkContext code, I'm seeing that it's newing up Job objects just
fine using nothing but the configuraiton. Using SparkContext.textFile() appears to be working
for me. Any ideas? Has anyone else run into this as well? Is it possible to have a method
like SparkContext.getJob() or something similar?
>> 
>> Thanks.
>> 
>> 
> 
> 


Mime
View raw message