spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Thalamati <suresh.thalam...@gmail.com>
Subject Re: Continue reading dataframe from file despite errors
Date Tue, 12 Sep 2017 21:59:29 GMT
Try the CSV   Option(“mode”,  "dropmalformed”), that might skip the error records. 


> On Sep 12, 2017, at 2:33 PM, jeff saremi <jeffsaremi@hotmail.com> wrote:
> 
> should have added some of the exception to be clear:
> 
> 17/09/12 14:14:17 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; aborting
job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 15.0
failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 (TID 15, localhost, executor
driver): java.lang.NumberFormatException: For input string: "south carolina"
>         at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Integer.parseInt(Integer.java:580)
>         at java.lang.Integer.parseInt(Integer.java:615)
>         at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
>         at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
>         at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:250)
> 
> From: jeff saremi <jeffsaremi@hotmail.com>
> Sent: Tuesday, September 12, 2017 2:32:03 PM
> To: user@spark.apache.org
> Subject: Continue reading dataframe from file despite errors
>  
> I'm using a statement like the following to load my dataframe from some text file
> Upon encountering the first error, the whole thing throws an exception and processing
stops.
> I'd like to continue loading even if that results in zero rows in my dataframe. How can
i do that?
> thanks
> 
> spark.read.schema(SomeSchema).option("sep", "\t").format("csv").load("somepath")


Mime
View raw message