spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Medan <ehrann.meh...@gmail.com>
Subject Re: [spark-sql] What is the right way to represent an “Any” type in Spark SQL?
Date Sun, 29 Mar 2015 22:27:05 GMT
Thanks Michael!
Can you please point me to the docs / source location for that automatic
casting? I'm just using it to extract the data and put it in a Map[String,
Any] (long story on the reason...) so I think the casting rules won't
"know" what to cast it to... right? I guess I can have the JSON / parquet
data store it as a string and also have metadata on the "Real" type, but
then it feels a little wrong. Is that the only way to handle it? or perhaps
there is a way to support an "Any" after all? is it just not implemented or
is it a Hive limitation? (I never used Hive other than here, so sorry for
the silly question)

p.s. I fixed the PR based on the code review, but the tests failed due to
GitHub's ongoing DDOS attack, is there a way to restart the tests? :) (or
should I just do a new commit with a white space char to trigger it?)

Thanks again, you guys are great!

On Sat, Mar 28, 2015 at 11:29 PM, Michael Armbrust <michael@databricks.com>
wrote:

> In this case I'd probably just store it as a String.  Our casting rules
> (which come from Hive) are such that when you use a string as an number of
> boolean it will be casted to the desired type.
>
> Thanks for the PR btw :)
>
> On Fri, Mar 27, 2015 at 2:31 PM, Eran Medan <ehrann.mehdan@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> I had a lot of questions today, sorry if I'm spamming the list, but I
>> thought it's better than posting all questions in one thread. Let me know
>> if I should throttle my posts ;)
>>
>> Here is my question:
>>
>> When I try to have a case class that has Any in it (e.g. I have a
>> property map and values can be either String, Int or Boolean, and since we
>> don't have union types, Any is the closest thing)
>>
>> When I try to register such an RDD as a table in 1.2.1 (or convert to
>> DataFrame in 1.3 and then register as a table)
>>
>> I get this weird exception:
>>
>> Exception in thread "main" scala.MatchError: Any (of class
>> scala.reflect.internal.Types$ClassNoArgsTypeRef) at
>> org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:112)
>>
>> Which from my interpretaion simply means that Any is not a valid type
>> that Spark SQL can support in it's schema
>>
>> I already sent a pull request <https://github.com/apache/spark/pull/5235> to
>> solve the cryptic exception but my question is - *is there a way to
>> support an "Any" type in Spark SQL?*
>>
>> disclaimer - also posted at
>> http://stackoverflow.com/questions/29310405/what-is-the-right-way-to-represent-an-any-type-in-spark-sql
>>
>
>

Mime
View raw message