spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Slow Mongo Read from Spark
Date Mon, 31 Aug 2015 07:13:25 GMT
Can you try with these key value classes and see the performance?

inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"


keyClassName = "org.apache.hadoop.io.Text"
valueClassName = "org.apache.hadoop.io.MapWritable"


Taken from databricks blog
<https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html>

Thanks
Best Regards

On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari <
deepesh.maheshwari17@gmail.com> wrote:

> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.
>
> /**** Code *****/
>
> config.set("mongo.job.input.format",
> "com.mongodb.hadoop.MongoInputFormat");
> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
> config.set("mongo.input.query","{host: 'abc.com'}");
>
> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");
>
>         JavaPairRDD<Object, BSONObject> mongoRDD =
> sc.newAPIHadoopRDD(config,
>                 com.mongodb.hadoop.MongoInputFormat.class, Object.class,
>                 BSONObject.class);
>
>         long count=mongoRDD.count();
>
> There are about 1.5million record.
> Though i am getting data but read operation took around 15min to read
> whole.
>
> Is this Api really too slow or am i missing something.
> Please suggest if there is an alternate approach to read data from Mongo
> faster.
>
> Thanks,
> Deepesh
>

Mime
View raw message