spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Moore (JIRA)" <>
Subject [jira] [Commented] (SPARK-17195) Dealing with JDBC column nullability when it is not reliable
Date Thu, 01 Sep 2016 00:03:21 GMT


Jason Moore commented on SPARK-17195:

That's right, and I totally agree that's where the fix needs to be.  And I'm pressing them
to make this fix.  I guess that means that this ticket can be closed, as it seems a reasonable
workaround within Spark itself isn't possible.  Once the TD driver has been fixed I'll return
here to mention the version it is fixed in.

> Dealing with JDBC column nullability when it is not reliable
> ------------------------------------------------------------
>                 Key: SPARK-17195
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Jason Moore
> Starting with Spark 2.0.0, the column "nullable" property is important to have correct
for the code generation to work properly.  Marking the column as nullable = false used to
(<2.0.0) allow null values to be operated on, but now this will result in:
> {noformat}
> Caused by: java.lang.NullPointerException
>         at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>         at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(
>         at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> {noformat}
> I'm all for the change towards a more ridged behavior (enforcing correct input).  But
the problem I'm facing now is that when I used JDBC to read from a Teradata server, the column
nullability is often not correct (particularly when sub-queries are involved).
> This is the line in question:
> I'm trying to work out what would be the way forward for me on this.  I know that it's
really the fault of the Teradata database server not returning the correct schema, but I'll
need to make Spark itself or my application resilient to this behavior.
> One of the Teradata JDBC Driver tech leads has told me that "when the rsmd.getSchemaName
and rsmd.getTableName methods return an empty zero-length string, then the other metadata
values may not be completely accurate" - so one option could be to treat the nullability (at
least) the same way as the "unknown" case (as nullable = true).  For reference, see the rest
of our discussion here:
> Any other thoughts?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message