spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yin Huai <yh...@databricks.com>
Subject Re: [spark-sql] JsonRDD
Date Tue, 03 Feb 2015 20:29:36 GMT
We probably will extract general purpose functions from JsonRDD and also do
the renaming through https://issues.apache.org/jira/browse/SPARK-5260.

On Tue, Feb 3, 2015 at 9:15 AM, Daniil Osipov <daniil.osipov@shazam.com>
wrote:

> Thanks Reynold,
>
> Case sensitivity issues are definitely orthogonal. I'll submit a bug or PR.
>
> Is there a way to rename the object to eliminate the confusion? Not sure
> how locked down the API is at this time, but it seems like a potential
> confusion point for developers.
>
> On Mon, Feb 2, 2015 at 4:30 PM, Reynold Xin <rxin@databricks.com> wrote:
>
>> It's bad naming - JsonRDD is actually not an RDD. It is just a set of
>> util methods.
>>
>> The case sensitivity issues seem orthogonal, and would be great to be
>> able to control that with a flag.
>>
>>
>> On Mon, Feb 2, 2015 at 4:16 PM, Daniil Osipov <daniil.osipov@shazam.com>
>> wrote:
>>
>>> Hey Spark developers,
>>>
>>> Is there a good reason for JsonRDD being a Scala object as opposed to
>>> class? Seems most other RDDs are classes, and can be extended.
>>>
>>> The reason I'm asking is that there is a problem with Hive
>>> interoperability
>>> with JSON DataFrames where jsonFile generates case sensitive schema,
>>> while
>>> Hive expects case insensitive and fails with an exception during
>>> saveAsTable if there are two columns with the same name in different
>>> case.
>>>
>>> I'm trying to resolve the problem, but that requires me to extend
>>> JsonRDD,
>>> which I can't do. Other RDDs are subclass friendly, why is JsonRDD
>>> different?
>>>
>>> Dan
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message