spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nivedita singh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26770) Misleading/unhelpful error message when wrapping a null in an Option
Date Thu, 28 Feb 2019 07:26:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780193#comment-16780193
] 

nivedita singh commented on SPARK-26770:
----------------------------------------

I can see this NPE while creating productExtract.

Please give some more information to reproduce the issue.

!image-2019-02-28-12-54-46-750.png!

> Misleading/unhelpful error message when wrapping a null in an Option
> --------------------------------------------------------------------
>
>                 Key: SPARK-26770
>                 URL: https://issues.apache.org/jira/browse/SPARK-26770
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: sam
>            Priority: Major
>
> This
> {code}
> // Using options to indicate nullable fields
> case class Product(productID: Option[Int],
>                                productName: Option[String])
> val productExtract: Dataset[Product] =
>         spark.createDataset(Seq(
>           Product(
>             productID = Some(6050286),
>             // user mistake here, should be `None` not `Some(null)`
>             productName = Some(null)
>           )))
> productExtract.count()
> {code}
> will give an error like the one below.  This error is thrown from quite deep down, but
there should be some handling logic further up to check for nulls and to give a more informative
error message.  E.g. it could tell the user which field is null, it could detect the `Some(null)`
error and suggest using `None`.
> Whatever the exception it shouldn't be NPE, since this is clearly a user error, so should
be some kind of user error exception.
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 9
in stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 (TID 276, 10.139.64.8,
executor 1): java.lang.NullPointerException
> 	at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:194)
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.serializefromobject_doConsume_0$(Unknown
Source)
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.mapelements_doConsume_0$(Unknown
Source)
> 	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
> 	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> 	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:620)
> 	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> 	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:112)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:384)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}
> I've seen quite a few other people with this error, but I don't think it's for the same
reason:
> https://docs.databricks.com/spark/latest/data-sources/tips/redshift-npe.html
> https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/Dt6ilC9Dn54
> https://issues.apache.org/jira/browse/SPARK-17195
> https://issues.apache.org/jira/browse/SPARK-18859
> https://github.com/datastax/spark-cassandra-connector/issues/1062
> https://stackoverflow.com/questions/39875711/spark-sql-2-0-nullpointerexception-with-a-valid-postgresql-query



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message