spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Cylke <marcin.cy...@ext.allegro.pl>
Subject Re: Using Neo4j with Apache Spark
Date Thu, 12 Mar 2015 08:17:13 GMT
On Thu, 12 Mar 2015 00:48:12 -0700
d34th4ck3r <gautam1237@gmail.com> wrote:

> I'm trying to use Neo4j with Apache Spark Streaming but I am finding
> serializability as an issue.
> 
> Basically, I want Apache Spark to parse and bundle my data in real
> time. After, the data has been bundled it should be stored in the
> database, Neo4j. However, I am getting this error:

Hi

It seems some things in your task aren't serializable. A quick look at
the code suggests graphDB as a potential problem. 

If you want to create that in one place (driver) and fetch it later in
the step you can do sth like this:

- create a container class, that you will broadcast

class LazyGraphDB extends Serializable {
  @transient
  override lazy val graphDB = new GraphDatabase()
}

- than in driver code:

val graphDbBc = sc.broadcast(new LazyGraphDB)

- and in the task you'd like to use it, just write:

graphDbBc.value.graphDB...

Just remember about all the "transient, lazy" modifiers.

Regards 
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message