spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject Re: Skipping Bad Records in Spark
Date Fri, 14 Nov 2014 15:47:50 GMT
You can combine map and filter in one operation using
collect(PartialFunction)  [1]

val cleanData = rawData.collect{case x  if (condition(x)) f(x) }

[1] **Not to be confused with the parameterless rdd.collect() that triggers
computations and delivers the results to the driver! **

PS: use the user@spark.apache.org for this kind of API usage discussion.
dev is mainly to discuss Spark internals.

On Fri, Nov 14, 2014 at 4:38 PM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com>
wrote:

> Hi Quizhuang - you have two options:
> 1) Within the map step define a validation function that will be executed
> on every record.
> 2) Use the filter function to create a filtered dataset prior to
> processing.
>
> On 11/14/14, 10:28 AM, "Qiuzhuang Lian" <qiuzhuang.lian@gmail.com> wrote:
>
> >Hi,
> >
> >MapReduce has the feature of skipping bad records. Is there any equivalent
> >in Spark? Should I use filter API to do this?
> >
> >Thanks,
> >Qiuzhuang
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed.  If the reader of this message is not the
> intended recipient, you are hereby notified that any review,
> retransmission, dissemination, distribution, copying or other use of, or
> taking of any action in reliance upon this information is strictly
> prohibited. If you have received this communication in error, please
> contact the sender and delete the material from your computer.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
View raw message