spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepesh Maheshwari <deepesh.maheshwar...@gmail.com>
Subject Re: Slow Mongo Read from Spark
Date Mon, 31 Aug 2015 07:29:57 GMT
Hi, I am using <spark.version>1.3.0</spark.version>

I am not getting constructor for above values

[image: Inline image 1]

So, i tried to shuffle the values in constructor .
[image: Inline image 2]

But, it is giving this error.Please suggest
[image: Inline image 3]

Best Regards

On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das <akhil@sigmoidanalytics.com>
wrote:

> Can you try with these key value classes and see the performance?
>
> inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"
>
>
> keyClassName = "org.apache.hadoop.io.Text"
> valueClassName = "org.apache.hadoop.io.MapWritable"
>
>
> Taken from databricks blog
> <https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html>
>
> Thanks
> Best Regards
>
> On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari <
> deepesh.maheshwari17@gmail.com> wrote:
>
>> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.
>>
>> /**** Code *****/
>>
>> config.set("mongo.job.input.format",
>> "com.mongodb.hadoop.MongoInputFormat");
>> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
>> config.set("mongo.input.query","{host: 'abc.com'}");
>>
>> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");
>>
>>         JavaPairRDD<Object, BSONObject> mongoRDD =
>> sc.newAPIHadoopRDD(config,
>>                 com.mongodb.hadoop.MongoInputFormat.class, Object.class,
>>                 BSONObject.class);
>>
>>         long count=mongoRDD.count();
>>
>> There are about 1.5million record.
>> Though i am getting data but read operation took around 15min to read
>> whole.
>>
>> Is this Api really too slow or am i missing something.
>> Please suggest if there is an alternate approach to read data from Mongo
>> faster.
>>
>> Thanks,
>> Deepesh
>>
>
>

Mime
View raw message