spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nan Zhu <zhunanmcg...@gmail.com>
Subject Re: Discussion on SPARK-1139
Date Thu, 27 Feb 2014 20:17:30 GMT
any discussion on this?  

I would like to hear more advices from the community before I create the PR,

an example is how to create a NewHadoopRDD


we get a configuration from JobContext

val updatedConf = job.getConfiguration
new NewHadoopRDD(this, fClass, kClass, vClass, updatedConf)


then we create a jobContext based on this configuration object

NewHadoopRDD.scala (L74)
val jobContext = newJobContext(conf, jobId)
val rawSplits = inputFormat.getSplits(jobContext).toArray


because inputFormat is from mapreduce package, it only accept a JobContext as the parameter
in its methods


I think we should avoid introduce Configuration as the parameter, but same thing as before,
it will change the APIs


Best,  

--  
Nan Zhu


On Wednesday, February 26, 2014 at 8:23 AM, Nan Zhu wrote:

> Hi, all  
>  
> I just created a JIRA https://spark-project.atlassian.net/browse/SPARK-1139 . The issue
discusses that:
>  
> the new Hadoop API based Spark APIs are actually a mixture of old and new Hadoop API.
>  
> Spark APIs are still using JobConf (or Configuration) as one of the parameters, but actually
Configuration has been replaced by mapreduce.Job in the new Hadoop API
>  
> for example : http://codesfusion.blogspot.ca/2013/10/hadoop-wordcount-with-new-map-reduce-api.html
 
>  
> &  
>  
> http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api (p10)
>  
> Personally I think it’s better to fix this design, but it will introduce some compatibility
issue  
>  
> Just bring it here for your advices
>  
> Best,  
>  
> --  
> Nan Zhu
>  


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message