spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From unk1102 <umesh.ka...@gmail.com>
Subject How to call mapPartitions on DataFrame?
Date Wed, 23 Dec 2015 17:43:56 GMT
Hi I have the following code where I use mapPartitions on RDD but then I need
to convert it into DataFrame so why do I need to convert DataFrame into RDD
and back into DataFrame for just calling mapPartitions why can I call it
directly on DataFrame? 

sourceFrame.toJavaRDD().mapPartitions(new
FlatMapFunction<Iterator&lt;Row>,Row>() {

   @Override 
   public Iterable<Row>  call(Iterable<Row> rowIterator) throws Exception { 
        List rowAsList = new ArrayList<>(); 
        while(rowIterator.hasNext()) { 
          Row row = rowIterator.next();
          rowAsList = iterate(JavaConversions.seqAsJavaList(row.toSeq())); 
          Row updatedRow = RowFactory.create(rowAsList.toArray()); 
          rowAsList.add(updatedRow);
        } 
        return rowAsList; 
   } 


When I see method signature it
is.mapPartitions(scala.Function1<Iterator&lt;Row>,Iterator<R>> f,ClassTag<R>
evidence$5)

How to I map above code into dataframe.mapPartitions please guide I am new
to Spark.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-call-mapPartitions-on-DataFrame-tp25791.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message