spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Baretta <>
Subject Re: SparkSQL not honoring schema
Date Thu, 11 Dec 2014 02:45:07 GMT
Hey Michael,

Thanks for the clarification. I was actually assuming the query would fail.
Ok, so this means I will have to do the validation in an RDD transformation
feeding into the SchemaRDD.

On Wed, Dec 10, 2014 at 6:27 PM, Michael Armbrust <>

> As the scala doc for applySchema says, "It is important to make sure that
> the structure of every [[Row]] of the provided RDD matches the provided
> schema. Otherwise, there will be runtime exceptions."  We don't check as
> doing runtime reflection on all of the data would be very expensive.  You
> will only get errors if you try to manipulate the data, but otherwise it
> will pass it though.
> I have written some debugging code (developer API, not guaranteed to be
> stable) though that you can use.
> import org.apache.spark.sql.execution.debug._
> schemaRDD.typeCheck()
> On Wed, Dec 10, 2014 at 6:19 PM, Alessandro Baretta <
> > wrote:
>> Hello,
>> I defined a SchemaRDD by applying a hand-crafted StructType to an RDD.
>> Some
>> of the Rows in the RDD are malformed--that is, they do not conform to the
>> schema defined by the StructType. When running a select statement on this
>> SchemaRDD I would expect SparkSQL to either reject the malformed rows or
>> fail. Instead, it returns whatever data it finds, even if malformed. Is
>> this the desired behavior? Is there no method in SparkSQL to check for
>> validity with respect to the schema?
>> Thanks.
>> Alex

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message