spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: SparkSQL not honoring schema
Date Thu, 11 Dec 2014 02:27:58 GMT
As the scala doc for applySchema says, "It is important to make sure that
the structure of every [[Row]] of the provided RDD matches the provided
schema. Otherwise, there will be runtime exceptions."  We don't check as
doing runtime reflection on all of the data would be very expensive.  You
will only get errors if you try to manipulate the data, but otherwise it
will pass it though.

I have written some debugging code (developer API, not guaranteed to be
stable) though that you can use.

import org.apache.spark.sql.execution.debug._
schemaRDD.typeCheck()

On Wed, Dec 10, 2014 at 6:19 PM, Alessandro Baretta <alexbaretta@gmail.com>
wrote:

> Hello,
>
> I defined a SchemaRDD by applying a hand-crafted StructType to an RDD. Some
> of the Rows in the RDD are malformed--that is, they do not conform to the
> schema defined by the StructType. When running a select statement on this
> SchemaRDD I would expect SparkSQL to either reject the malformed rows or
> fail. Instead, it returns whatever data it finds, even if malformed. Is
> this the desired behavior? Is there no method in SparkSQL to check for
> validity with respect to the schema?
>
> Thanks.
>
> Alex
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message