spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: pass configuration parameters to PySpark job
Date Mon, 18 May 2015 21:05:49 GMT
In PySpark, it serializes the functions/closures together with used
global values.

For example,

global_param = 111

def my_map(x):
     return x + global_param

rdd.map(my_map)

- Davies

On Mon, May 18, 2015 at 7:26 AM, Oleg Ruchovets <oruchovets@gmail.com> wrote:
> Hi ,
>    I am looking a way to pass configuration parameters to spark job.
> In general I have quite simple PySpark job.
>
>   def process_model(k, vc):
>        ....
>        do something
>        ....
>
>
>  sc = SparkContext(appName="TAD")
>     lines = sc.textFile(input_job_files)
>     result = lines.map(doSplit).groupByKey().map(lambda (k,vc):
> process_model(k,vc))
>
> Question:
>     In case I need to pass to process_model function additional metadata ,
> parameters , etc ...
>
>    I tried to do something like
>    param = 'param1'
>   result = lines.map(doSplit).groupByKey().map(lambda (param,k,vc):
> process_model(param1,k,vc)) ,
>
> but job stops to work , also it looks like not elegant solution.
> Is there a way to have access to SparkContext from my custom functions?
> I found that there are methods setLocalProperty/getLocalProperty   but I
> didn't find example how to use it for my requirements (from my function).
>
> It would be great to have short example how to pass parameters.
>
> Thanks
> Oleg.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message