spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Fernandez <oliver.fernan...@marfeel.com>
Subject Spark with MongoDB problem
Date Wed, 14 Jan 2015 08:46:48 GMT
Hello,

I'm learning to use Spark with MongoDB, but I've encountered a problem that
I think is related to the way I use Spark., because it doesn't make any
sense to me.

My concept test is that I want to filter a collection containing about 800K
documents by a certain field.

My code is very simple. Connect to my MongoDB, apply a filter
transformation and then count the elements:

JavaSparkContext sc = new JavaSparkContext("local[2]", "Spark Test");

Configuration config = new Configuration();
config.set("mongo.input.uri", "mongodb://127.0.0.1:27017/myDB.myCollection
");

JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config,
com.mongodb.hadoop.MongoInputFormat.class, Object.class, BSONObject.class);

long numberOfFilteredElements = mongoRDD
    .filter(myCollectionDocument ->
myCollectionDocument._2().get("site").equals("marfeel.com"))
    .count();

System.out.format("Filtered collection size: %d%n",
numberOfFilteredElements);

When I execute this code, the Mongo driver splits my collection into 2810
partitions, so equal number of tasks start to process.

About the task number 1000, I get the following error message:

ERROR Executor: Exception in task 990.0 in stage 0.0 (TID 990)
java.lang.OutOfMemoryError: unable to create new native thread

I've searched a lot about this error, but it doesn't make any sense to me.
I came up the conclusion that I have a problem with my code, that I have
some library versions incompatibilities or that my real problem is that I'm
getting the whole Spark concept wrong, and that the code above doesn't make
any sense at all.

I'm using the following library versions:

org.apache.spark.spark-core_2.11 -> 1.2.0
org.apache.hadoop.hadoop-client -> 2.4.1
org.mongodb.mongo-hadoop.mongo-hadoop-core -> 1.3.1
org.mongodb.mongo-java-driver -> 2.13.0-rc1


Thanks a lot for your help!

-- 
<http://www.marfeel.com/>

Óliver Fernández
ES: +34 93 178 59 50  ext. 107
US: +1 917-341-2540 ext. 107
UK: +44 207 048 37 28 ext. 107

http://www.marfeel.com


Discover our referral program
<http://blog.marfeel.com/earn-money-marfeel-referral-program/>

Mime
View raw message