spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harold Nguyen <har...@nexgate.com>
Subject Re: Manipulating RDDs within a DStream
Date Fri, 31 Oct 2014 17:37:08 GMT
Thanks Lalit, and Helena,

What I'd like to do is manipulate the values within a DStream like this:

DStream.foreachRDD( rdd => {

       val arr = record.toArray

}

I'd then like to be able to insert results from the arr back into
Cassadnra, after I've manipulated the arr array.
However, for all the examples I've seen, inserting into Cassandra is
something like:

val collection = sc.parralellize(Seq("foo", bar")))

Where "foo" and "bar" could be elements in the arr array. So I would like
to know how to insert into Cassandra at the worker level.

Best wishes,

Harold

On Thu, Oct 30, 2014 at 11:48 PM, lalit1303 <lalit@sigmoidanalytics.com>
wrote:

> Hi,
>
> Since, the cassandra object is not serializable you can't open the
> connection on driver level and access the object inside foreachRDD (i.e. at
> worker level).
> You have to open connection inside foreachRDD only, perform the operation
> and then close the connection.
>
> For example:
>
>  wordCounts.foreachRDD( rdd => {
>
>        val arr = rdd.toArray
>
>        OPEN cassandra connection
>        store arr
>        CLOSE cassandra connection
>
> })
>
>
> Thanks
>
>
>
> -----
> Lalit Yadav
> lalit@sigmoidanalytics.com
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Manipulating-RDDs-within-a-DStream-tp17740p17800.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message