spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nan Zhu <>
Subject Re: Discussion on SPARK-1139
Date Thu, 27 Feb 2014 20:17:30 GMT
any discussion on this?  

I would like to hear more advices from the community before I create the PR,

an example is how to create a NewHadoopRDD

we get a configuration from JobContext

val updatedConf = job.getConfiguration
new NewHadoopRDD(this, fClass, kClass, vClass, updatedConf)

then we create a jobContext based on this configuration object

NewHadoopRDD.scala (L74)
val jobContext = newJobContext(conf, jobId)
val rawSplits = inputFormat.getSplits(jobContext).toArray

because inputFormat is from mapreduce package, it only accept a JobContext as the parameter
in its methods

I think we should avoid introduce Configuration as the parameter, but same thing as before,
it will change the APIs


Nan Zhu

On Wednesday, February 26, 2014 at 8:23 AM, Nan Zhu wrote:

> Hi, all  
> I just created a JIRA . The issue
discusses that:
> the new Hadoop API based Spark APIs are actually a mixture of old and new Hadoop API.
> Spark APIs are still using JobConf (or Configuration) as one of the parameters, but actually
Configuration has been replaced by mapreduce.Job in the new Hadoop API
> for example :
> &  
> (p10)
> Personally I think it’s better to fix this design, but it will introduce some compatibility
> Just bring it here for your advices
> Best,  
> --  
> Nan Zhu

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message