spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepesh Maheshwari <deepesh.maheshwar...@gmail.com>
Subject Slow Mongo Read from Spark
Date Mon, 31 Aug 2015 06:56:17 GMT
Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.

/**** Code *****/

config.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
config.set("mongo.input.query","{host: 'abc.com'}");

JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");

        JavaPairRDD<Object, BSONObject> mongoRDD =
sc.newAPIHadoopRDD(config,
                com.mongodb.hadoop.MongoInputFormat.class, Object.class,
                BSONObject.class);

        long count=mongoRDD.count();

There are about 1.5million record.
Though i am getting data but read operation took around 15min to read whole.

Is this Api really too slow or am i missing something.
Please suggest if there is an alternate approach to read data from Mongo
faster.

Thanks,
Deepesh

Mime
View raw message