spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Horia <>
Subject Re: RDD function question
Date Mon, 16 Sep 2013 21:27:04 GMT
Without sorting, you can implement this using the 'filter' transformation.

This will eventually read all the rows once, but subsequently only shuffle
and send the transformed data which passed the filter.

Does this help, or did I misunderstand?
On Sep 16, 2013 1:37 PM, "satheessh chinnu" <> wrote:

> i am having a text file.  Each line is a record and first ten characters
> on each line is a date in YYYY-MM-DD format.
> i would like to run a map function on this RDD with specific date range.
> (i.e from 2005 -01-01 to 2007-12-31).  I would like to avoid reading the
> records out of the specified data range. (i.e kind of primary index sorted
> by date)
> is there way to implement this?

View raw message