spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Woody <>
Subject Re: Lazy casting with Catalyst
Date Sat, 28 Mar 2015 16:26:48 GMT
Hey Cheng,

I didn't meant that catalyst casting was eager, just that my approaches
thus far seem to have been. Maybe I should give a concrete example?

I have columns A, B, C where B is saved as a String but I'd like all
references to B to go through a Cast to decimal regardless of the code used
on the SchemaRDD. So if someone does a min(B) it uses Decimal ordering
instead of String.

One approach that I had taken was to do a select of everything with the
casts on certain columns, but then when I did a count(literal(1)) on top of
that RDD it seemed to bring in the whole row.


On Sat, Mar 28, 2015 at 11:35 AM, Cheng Lian <> wrote:

> Hi Pat,
> I don't understand what "lazy casting" mean here. Why do you think current
> Catalyst casting is "eager"? Casting happens at runtime, and doesn't
> disable column pruning.
> Cheng
> On 3/28/15 11:26 PM, Patrick Woody wrote:
>> Hi all,
>> In my application, we take input from Parquet files where BigDecimals are
>> written as Strings to maintain arbitrary precision.
>> I was hoping to convert these back over to Decimal with Unlimited
>> precision, but I'd still like to maintain the Parquet column pruning (all
>> my attempts thus far seem to bring in the whole Row). Is it possible to do
>> this lazily through catalyst?
>> Basically I'd want to do Cast(col, DecimalType()) whenever col is actually
>> referenced. Any tips on how to approach this would be appreciated.
>> Thanks!
>> -Pat

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message