spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Erlandson <...@redhat.com>
Subject Re: removing first record from RDD[String]
Date Tue, 23 Dec 2014 17:29:08 GMT

There is also a lazy implementation:
http://erikerlandson.github.io/blog/2014/07/29/deferring-spark-actions-to-lazy-transforms-with-the-promise-rdd/

I generated a PR for it -- there was also an alternate proposal for having it be a library
in the new Spark Packages site:
http://databricks.com/blog/2014/12/22/announcing-spark-packages.html



----- Original Message -----
> Hi,
> maybe the drop function is helpful for you (even though this is probably
> more than you need, still interesting read)
> http://erikerlandson.github.io/blog/2014/07/27/some-implications-of-supporting-the-scala-drop-method-for-spark-rdds/
> 
> Joerg
> 
> On Tue, Dec 23, 2014 at 5:45 PM, Hao Ren <invkrh@gmail.com> wrote:
> 
> > Hi,
> >
> > I guess you would like to remove the header of a CSV file.
> >
> > You can play with partitions. =)
> >
> > // src is your RDD
> > val noHeader = src.mapPartitionsWithIndex(
> > (i, iterator) =>
> >     if (i == 0 && iterator.hasNext) {
> >        iterator.next
> >        iterator
> >     } else iterator)
> >
> > Thus, you don't need to filter on the whole RDD. Good luck.
> >
> > Hao
> >
> >
> >
> > --
> > View this message in context:
> > http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20836.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message