spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepesh Maheshwari <deepesh.maheshwar...@gmail.com>
Subject Re: Slow Mongo Read from Spark
Date Mon, 31 Aug 2015 09:43:06 GMT
tried it,,gives the same above exception

Exception in thread "main" java.io.IOException: No FileSystem for scheme:
mongodb

In you case, do you have used above code.
What read throughput , you get?

On Mon, Aug 31, 2015 at 2:04 PM, Akhil Das <akhil@sigmoidanalytics.com>
wrote:

> FYI, newAPIHadoopFile and newAPIHadoopRDD uses the NewHadoopRDD class
> itself underneath and it doesnt mean it will only read from HDFS. Give it a
> shot if you haven't tried it already (it just the inputformat and the
> reader which are different from your approach).
>
> Thanks
> Best Regards
>
> On Mon, Aug 31, 2015 at 1:14 PM, Deepesh Maheshwari <
> deepesh.maheshwari17@gmail.com> wrote:
>
>> Hi Akhil,
>>
>> This code snippet is from below link
>>
>> https://github.com/crcsmnky/mongodb-spark-demo/blob/master/src/main/java/com/mongodb/spark/demo/Recommender.java
>>
>> Here it reading data from HDFS file system but in our case i need to read
>> from mongodb.
>>
>> I have tried it earlier and now again tried it but is giving below error
>> which is self explanantory.
>>
>> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
>> mongodb
>>
>> On Mon, Aug 31, 2015 at 1:03 PM, Akhil Das <akhil@sigmoidanalytics.com>
>> wrote:
>>
>>> Here's a piece of code which works well for us (spark 1.4.1)
>>>
>>>         Configuration bsonDataConfig = new Configuration();
>>>         bsonDataConfig.set("mongo.job.input.format",
>>> "com.mongodb.hadoop.BSONFileInputFormat");
>>>
>>>         Configuration predictionsConfig = new Configuration();
>>>         predictionsConfig.set("mongo.output.uri", mongodbUri);
>>>
>>>         JavaPairRDD<Object,BSONObject> bsonRatingsData =
>>> sc.newAPIHadoopFile(
>>>             ratingsUri, BSONFileInputFormat.class, Object.class,
>>>                 BSONObject.class, bsonDataConfig);
>>>
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari <
>>> deepesh.maheshwari17@gmail.com> wrote:
>>>
>>>> Hi, I am using <spark.version>1.3.0</spark.version>
>>>>
>>>> I am not getting constructor for above values
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> So, i tried to shuffle the values in constructor .
>>>> [image: Inline image 2]
>>>>
>>>> But, it is giving this error.Please suggest
>>>> [image: Inline image 3]
>>>>
>>>> Best Regards
>>>>
>>>> On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das <akhil@sigmoidanalytics.com
>>>> > wrote:
>>>>
>>>>> Can you try with these key value classes and see the performance?
>>>>>
>>>>> inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"
>>>>>
>>>>>
>>>>> keyClassName = "org.apache.hadoop.io.Text"
>>>>> valueClassName = "org.apache.hadoop.io.MapWritable"
>>>>>
>>>>>
>>>>> Taken from databricks blog
>>>>> <https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html>
>>>>>
>>>>> Thanks
>>>>> Best Regards
>>>>>
>>>>> On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari <
>>>>> deepesh.maheshwari17@gmail.com> wrote:
>>>>>
>>>>>> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.
>>>>>>
>>>>>> /**** Code *****/
>>>>>>
>>>>>> config.set("mongo.job.input.format",
>>>>>> "com.mongodb.hadoop.MongoInputFormat");
>>>>>> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
>>>>>> config.set("mongo.input.query","{host: 'abc.com'}");
>>>>>>
>>>>>> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps");
>>>>>>
>>>>>>         JavaPairRDD<Object, BSONObject> mongoRDD =
>>>>>> sc.newAPIHadoopRDD(config,
>>>>>>                 com.mongodb.hadoop.MongoInputFormat.class,
>>>>>> Object.class,
>>>>>>                 BSONObject.class);
>>>>>>
>>>>>>         long count=mongoRDD.count();
>>>>>>
>>>>>> There are about 1.5million record.
>>>>>> Though i am getting data but read operation took around 15min to
read
>>>>>> whole.
>>>>>>
>>>>>> Is this Api really too slow or am i missing something.
>>>>>> Please suggest if there is an alternate approach to read data from
>>>>>> Mongo faster.
>>>>>>
>>>>>> Thanks,
>>>>>> Deepesh
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message