spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Unsupported Catalyst types in Parquet
Date Tue, 30 Dec 2014 02:42:57 GMT
Yeah, I saw those.  The problem is that #3822 truncates timestamps that
include nanoseconds.

On Mon, Dec 29, 2014 at 5:14 PM, Alessandro Baretta <alexbaretta@gmail.com>
wrote:

> Michael,
>
> Actually, Adrian Wang already created pull requests for these issues.
>
> https://github.com/apache/spark/pull/3820
> https://github.com/apache/spark/pull/3822
>
> What do you think?
>
> Alex
>
> On Mon, Dec 29, 2014 at 3:07 PM, Michael Armbrust <michael@databricks.com>
> wrote:
>
>> I'd love to get both of these in.  There is some trickiness that I talk
>> about on the JIRA for timestamps since the SQL timestamp class can support
>> nano seconds and I don't think parquet has a type for this.  Other systems
>> (impala) seem to use INT96.  It would be great to maybe ask on the parquet
>> mailing list what the plan is there to make sure that whatever we do is
>> going to be compatible long term.
>>
>> Michael
>>
>> On Mon, Dec 29, 2014 at 8:13 AM, Alessandro Baretta <
>> alexbaretta@gmail.com> wrote:
>>
>>> Daoyuan,
>>>
>>> Thanks for creating the jiras. I need these features by... last week, so
>>> I'd be happy to take care of this myself, if only you or someone more
>>> experienced than me in the SparkSQL codebase could provide some guidance.
>>>
>>> Alex
>>> On Dec 29, 2014 12:06 AM, "Wang, Daoyuan" <daoyuan.wang@intel.com>
>>> wrote:
>>>
>>>> Hi Alex,
>>>>
>>>> I'll create JIRA SPARK-4985 for date type support in parquet, and
>>>> SPARK-4987 for timestamp type support. For decimal type, I think we only
>>>> support decimals that fits in a long.
>>>>
>>>> Thanks,
>>>> Daoyuan
>>>>
>>>> -----Original Message-----
>>>> From: Alessandro Baretta [mailto:alexbaretta@gmail.com]
>>>> Sent: Saturday, December 27, 2014 2:47 PM
>>>> To: dev@spark.apache.org; Michael Armbrust
>>>> Subject: Unsupported Catalyst types in Parquet
>>>>
>>>> Michael,
>>>>
>>>> I'm having trouble storing my SchemaRDDs in Parquet format with
>>>> SparkSQL, due to my RDDs having having DateType and DecimalType fields.
>>>> What would it take to add Parquet support for these Catalyst? Are there any
>>>> other Catalyst types for which there is no Catalyst support?
>>>>
>>>> Alex
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message