spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anselmi Rodriguez, Agustina, Vodafone UK" <agustina.anse...@vodafone.com>
Subject Re: Spark read csv option - capture exception in a column in permissive mode
Date Mon, 17 Jun 2019 07:46:25 GMT
You can sort of hack this by reading it as an RDD[String] and trying to implement a custom
parser i.e.

Val rddRows = rdd.map parseMyCols

Def parseMyCols(rawVal: String) : Row = {
parse(rawVal) match {
case Success(parsedRowValues) = > Row(parsedRowValues :+ “”: _*)
case Failure(exception) => Row( nullList :+ exception.getMessage )
}
}

Hope this helps

On 17 Jun 2019, at 06:31, Ajay Thompson <ajay.thompson@thedatateam.in<mailto:ajay.thompson@thedatateam.in>>
wrote:

There's a column which captures the corrupted record. However, the exception isn't captured.
If the exception is captured in another column it'll be very useful.

On Mon, 17 Jun, 2019, 10:56 AM Gourav Sengupta, <gourav.sengupta@gmail.com<mailto:gourav.sengupta@gmail.com>>
wrote:
Hi,

it already does, I think, you just have to add the column in the schema that you are using
to read.

Regards,
Gourav

On Sun, Jun 16, 2019 at 2:48 PM <ajay.thompson@thedatateam.in<mailto:ajay.thompson@thedatateam.in>>
wrote:
Hi Team,

Can we have another column which gives the corrupted record reason in permissive mode while
reading csv.

Thanks,
Ajay

Mime
View raw message