spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Localhost shell <universal.localh...@gmail.com>
Subject Re: How to access objects declared and initialized outside the call() method of JavaRDD
Date Thu, 23 Oct 2014 21:21:49 GMT
Hey Jayant,

In my previous mail, I have mentioned a github gist
*https://gist.github.com/rssvihla/6577359860858ccb0b33
<https://gist.github.com/rssvihla/6577359860858ccb0b33> *which is doing
very similar to what I want to do but its using scala language for spark.

Hence my question (reiterating from previous mail):
*Is the current approach possible if I write the code in Scala?*

Why does javaFunctions() need the SparkContext?
Because per row in the RDD, I am making a get call to the data store
'cassandra'. The reason why I am fetching only selective data (that I will
update later) because Cassandra doesn't provide range queries so I thought
fetching complete data might be expensive.



On Thu, Oct 23, 2014 at 11:22 AM, Jayant Shekhar <jayant@cloudera.com>
wrote:

> +1 to Sean.
>
> Is it possible to rewrite your code to not use SparkContext in RDD. Or why
> does javaFunctions() need the SparkContext.
>
> On Thu, Oct 23, 2014 at 10:53 AM, Localhost shell <
> universal.localhost@gmail.com> wrote:
>
>> Bang On Sean
>>
>> Before sending the issue mail, I was able to remove the compilation error
>> by making it final but then got the
>> Caused by: java.io.NotSerializableException:
>> org.apache.spark.api.java.JavaSparkContext   (As you mentioned)
>>
>> Now regarding your suggestion of changing the business logic,
>> 1. *Is the current approach possible if I write the code in Scala ?* I
>> think probably not but wanted to check with you.
>>
>> 2. Brief steps of what the code is doing:
>>
>>   1. Get raw sessions data from datatsore (C*)
>>         2. Process the raw sessions data
>>         3. Iterate over the processed data(derive from #2) and fetch the
>> previously aggregated data from store for those rowkeys
>>            Add the values from this batch to previous batch values
>>         4. Save back the updated values
>>
>>    * This github gist might explain you more
>> https://gist.github.com/rssvihla/6577359860858ccb0b33
>> <https://gist.github.com/rssvihla/6577359860858ccb0b33> and it does a
>> similar thing in scala.*
>>     I am trying to achieve a similar thing in Java using Spark Batch with
>> C* as the datastore.
>>
>> I have attached the java code file to provide you some code details. (If
>> I was not able to explain you the problem so the code will be handy)
>>
>>
>> The reason why I am fetching only selective data (that I will update
>> later) because Cassanbdra doesn't provide range queries so I thought
>> fetching complete data might be expensive.
>>
>> It will be great if you can share ur thoughts.
>>
>> On Thu, Oct 23, 2014 at 1:48 AM, Sean Owen <sowen@cloudera.com> wrote:
>>
>>> In Java, javaSparkContext would have to be declared final in order for
>>> it to be accessed inside an inner class like this. But this would
>>> still not work as the context is not serializable. You  should rewrite
>>> this so you are not attempting to use the Spark context inside  an
>>> RDD.
>>>
>>> On Thu, Oct 23, 2014 at 8:46 AM, Localhost shell
>>> <universal.localhost@gmail.com> wrote:
>>> > Hey All,
>>> >
>>> > I am unable to access objects declared and initialized outside the
>>> call()
>>> > method of JavaRDD.
>>> >
>>> > In the below code snippet, call() method makes a fetch call to C* but
>>> since
>>> > javaSparkContext is defined outside the call method scope so compiler
>>> give a
>>> > compilation error.
>>> >
>>> > stringRdd.foreach(new VoidFunction<String>() {
>>> >                 @Override
>>> >                 public void call(String str) throws Exception {
>>> >                     JavaRDD<String> vals =
>>> > javaFunctions(javaSparkContext).cassandraTable("schema", "table",
>>> > String.class)
>>> >                             .select("val");
>>> >                 }
>>> >             });
>>> >
>>> > In other languages I have used closure to do this but not able to
>>> achieve
>>> > the same here.
>>> >
>>> > Can someone suggest how to achieve this in the current code context?
>>> >
>>> >
>>> > --Unilocal
>>> >
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> --Unilocal
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>
>


-- 
--Unilocal

Mime
View raw message