spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavan Kumar Varma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-1443) Unable to Access MongoDB GridFS data with Spark using mongo-hadoop API
Date Tue, 08 Apr 2014 13:07:15 GMT

     [ https://issues.apache.org/jira/browse/SPARK-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pavan Kumar Varma updated SPARK-1443:
-------------------------------------

    Description: 
I saved a 2GB pdf file into MongoDB using GridFS. now i want process those GridFS collection
data using Java Spark Mapreduce API. previously i have successfully processed mongoDB collections
with Apache spark using Mongo-Hadoop connector. now i'm unable to GridFS collections with
the following code.

MongoConfigUtil.setInputURI(config, "mongodb://localhost:27017/pdfbooks.fs.chunks" );
 MongoConfigUtil.setOutputURI(config,"mongodb://localhost:27017/"+output );
 JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config,
            com.mongodb.hadoop.MongoInputFormat.class, Object.class,
            BSONObject.class);
 JavaRDD<String> words = mongoRDD.flatMap(new FlatMapFunction<Tuple2<Object,BSONObject>,
   String>() {                                
   @Override
   public Iterable<String> call(Tuple2<Object, BSONObject> arg) {   
   System.out.println(arg._2.toString());
   ...
Please suggest/provide  better API methods to access MongoDB GridFS data.



  was:
I saved a 2GB pdf file into MongoDB using GridFS. now i want process those GridFS collection
data using Java Spark Mapreduce API. previously i have successfully processed mongoDB collections
with Apache spark using Mongo-Hadoop connector. now i'm unable to GridFS collections with
the following code.

MongoConfigUtil.setInputURI(config, "mongodb://localhost:27017/pdfbooks.fs.chunks" );
 MongoConfigUtil.setOutputURI(config,"mongodb://localhost:27017/"+output );
 JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config,
            com.mongodb.hadoop.MongoInputFormat.class, Object.class,
            BSONObject.class);
 JavaRDD<String> words = mongoRDD.flatMap(new FlatMapFunction<Tuple2<Object,BSONObject>,
   String>() {                                
   @Override
   public Iterable<String> call(Tuple2<Object, BSONObject> arg) {   
   System.out.println(arg._2.toString());
   ...
Please provide a better API to access MongoDB GridFS data.




> Unable to Access MongoDB GridFS data with Spark using mongo-hadoop API
> ----------------------------------------------------------------------
>
>                 Key: SPARK-1443
>                 URL: https://issues.apache.org/jira/browse/SPARK-1443
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output, Java API, Spark Core
>    Affects Versions: 0.9.0
>         Environment: Java 1.7,Hadoop 2.2.0,Spark 0.9.0,Ubuntu 12.4,
>            Reporter: Pavan Kumar Varma
>            Priority: Critical
>             Fix For: 0.9.0
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> I saved a 2GB pdf file into MongoDB using GridFS. now i want process those GridFS collection
data using Java Spark Mapreduce API. previously i have successfully processed mongoDB collections
with Apache spark using Mongo-Hadoop connector. now i'm unable to GridFS collections with
the following code.
> MongoConfigUtil.setInputURI(config, "mongodb://localhost:27017/pdfbooks.fs.chunks" );
>  MongoConfigUtil.setOutputURI(config,"mongodb://localhost:27017/"+output );
>  JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config,
>             com.mongodb.hadoop.MongoInputFormat.class, Object.class,
>             BSONObject.class);
>  JavaRDD<String> words = mongoRDD.flatMap(new FlatMapFunction<Tuple2<Object,BSONObject>,
>    String>() {                                
>    @Override
>    public Iterable<String> call(Tuple2<Object, BSONObject> arg) {   
>    System.out.println(arg._2.toString());
>    ...
> Please suggest/provide  better API methods to access MongoDB GridFS data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message