spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romi Kuntsman <r...@totango.com>
Subject Re: how to send additional configuration to the RDD after it was lazily created
Date Mon, 21 Sep 2015 09:38:11 GMT
What new information do you know after creating the RDD, that you didn't
know at the time of it's creation?
I think the whole point is that RDD is immutable, you can't change it once
it was created.
Perhaps you need to refactor your logic to know the parameters earlier, or
create a whole new RDD again.

*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com

On Thu, Sep 17, 2015 at 10:07 AM, Gil Vernik <GILV@il.ibm.com> wrote:

> Hi,
>
> I have the following case, which i am not sure how to resolve.
>
> My code uses HadoopRDD and creates various RDDs on top of it
> (MapPartitionsRDD, and so on )
> After all RDDs were lazily created, my code "knows" some new information
> and i want that "compute" method of the HadoopRDD will be aware of it (at
> the point when "compute" method will be called).
> What is the possible way 'to send' some additional information to the
> compute method of the HadoopRDD after this RDD is lazily created?
> I tried to play with configuration, like to perform set("test","111") in
> the code and modify the compute method of HadoopRDD with get("test") - but
> of it's not working,  since SparkContext has only clone of the of the
> configuration and it can't be modified in run time.
>
> Any thoughts how can i make it?
>
> Thanks
> Gil.

Mime
View raw message