spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jayant Shekhar <jay...@cloudera.com>
Subject Re: How to access objects declared and initialized outside the call() method of JavaRDD
Date Thu, 23 Oct 2014 18:22:05 GMT
+1 to Sean.

Is it possible to rewrite your code to not use SparkContext in RDD. Or why
does javaFunctions() need the SparkContext.

On Thu, Oct 23, 2014 at 10:53 AM, Localhost shell <
universal.localhost@gmail.com> wrote:

> Bang On Sean
>
> Before sending the issue mail, I was able to remove the compilation error
> by making it final but then got the
> Caused by: java.io.NotSerializableException:
> org.apache.spark.api.java.JavaSparkContext   (As you mentioned)
>
> Now regarding your suggestion of changing the business logic,
> 1. *Is the current approach possible if I write the code in Scala ?* I
> think probably not but wanted to check with you.
>
> 2. Brief steps of what the code is doing:
>
>   1. Get raw sessions data from datatsore (C*)
>         2. Process the raw sessions data
>         3. Iterate over the processed data(derive from #2) and fetch the
> previously aggregated data from store for those rowkeys
>            Add the values from this batch to previous batch values
>         4. Save back the updated values
>
>    * This github gist might explain you more
> https://gist.github.com/rssvihla/6577359860858ccb0b33
> <https://gist.github.com/rssvihla/6577359860858ccb0b33> and it does a
> similar thing in scala.*
>     I am trying to achieve a similar thing in Java using Spark Batch with
> C* as the datastore.
>
> I have attached the java code file to provide you some code details. (If I
> was not able to explain you the problem so the code will be handy)
>
>
> The reason why I am fetching only selective data (that I will update
> later) because Cassanbdra doesn't provide range queries so I thought
> fetching complete data might be expensive.
>
> It will be great if you can share ur thoughts.
>
> On Thu, Oct 23, 2014 at 1:48 AM, Sean Owen <sowen@cloudera.com> wrote:
>
>> In Java, javaSparkContext would have to be declared final in order for
>> it to be accessed inside an inner class like this. But this would
>> still not work as the context is not serializable. You  should rewrite
>> this so you are not attempting to use the Spark context inside  an
>> RDD.
>>
>> On Thu, Oct 23, 2014 at 8:46 AM, Localhost shell
>> <universal.localhost@gmail.com> wrote:
>> > Hey All,
>> >
>> > I am unable to access objects declared and initialized outside the
>> call()
>> > method of JavaRDD.
>> >
>> > In the below code snippet, call() method makes a fetch call to C* but
>> since
>> > javaSparkContext is defined outside the call method scope so compiler
>> give a
>> > compilation error.
>> >
>> > stringRdd.foreach(new VoidFunction<String>() {
>> >                 @Override
>> >                 public void call(String str) throws Exception {
>> >                     JavaRDD<String> vals =
>> > javaFunctions(javaSparkContext).cassandraTable("schema", "table",
>> > String.class)
>> >                             .select("val");
>> >                 }
>> >             });
>> >
>> > In other languages I have used closure to do this but not able to
>> achieve
>> > the same here.
>> >
>> > Can someone suggest how to achieve this in the current code context?
>> >
>> >
>> > --Unilocal
>> >
>> >
>> >
>>
>
>
>
> --
> --Unilocal
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

Mime
View raw message