spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Baretta <alexbare...@gmail.com>
Subject Re: SparkSQL not honoring schema
Date Thu, 11 Dec 2014 02:45:07 GMT
Hey Michael,

Thanks for the clarification. I was actually assuming the query would fail.
Ok, so this means I will have to do the validation in an RDD transformation
feeding into the SchemaRDD.

On Wed, Dec 10, 2014 at 6:27 PM, Michael Armbrust <michael@databricks.com>
wrote:

> As the scala doc for applySchema says, "It is important to make sure that
> the structure of every [[Row]] of the provided RDD matches the provided
> schema. Otherwise, there will be runtime exceptions."  We don't check as
> doing runtime reflection on all of the data would be very expensive.  You
> will only get errors if you try to manipulate the data, but otherwise it
> will pass it though.
>
> I have written some debugging code (developer API, not guaranteed to be
> stable) though that you can use.
>
> import org.apache.spark.sql.execution.debug._
> schemaRDD.typeCheck()
>
> On Wed, Dec 10, 2014 at 6:19 PM, Alessandro Baretta <alexbaretta@gmail.com
> > wrote:
>
>> Hello,
>>
>> I defined a SchemaRDD by applying a hand-crafted StructType to an RDD.
>> Some
>> of the Rows in the RDD are malformed--that is, they do not conform to the
>> schema defined by the StructType. When running a select statement on this
>> SchemaRDD I would expect SparkSQL to either reject the malformed rows or
>> fail. Instead, it returns whatever data it finds, even if malformed. Is
>> this the desired behavior? Is there no method in SparkSQL to check for
>> validity with respect to the schema?
>>
>> Thanks.
>>
>> Alex
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message