spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Spitzer <>
Subject Re: [SparkSql] Casting of Predicate Literals
Date Tue, 04 Aug 2020 16:55:13 GMT
Thanks! That's exactly what I was hoping for! Thanks for finding the Jira
for me!

On Tue, Aug 4, 2020 at 11:46 AM Wenchen Fan <> wrote:

> I think this is not a problem in 3.0 anymore, see
> On Wed, Aug 5, 2020 at 12:08 AM Russell Spitzer <>
> wrote:
>> I've just run into this issue again with another user and I feel like
>> most folks here have seen some flavor of this at some point.
>> The user registers a Datasource with a column of type Date (or some non
>> string) then performs a query that looks like.
>> *SELECT * from Source WHERE date_col > '2020-08-03'*
>> Seeing that the predicate literal here is a String, Spark needs to make a
>> change so that the DataSource column will be of the same type (Date),
>> so it places a "Cast" on the Datasource column so our plan ends up
>> looking like.
>> Cast(date_col as String) > '2020-08-03'
>> Since the Datasource Strategies can't handle a push down of the "Cast"
>> function we lose the predicate pushdown we could
>> have had. This can change a Job from a single partition lookup into a
>> full scan leading to a very confusing situation for
>> the end user. I also wonder about the relative cost here since we could
>> be avoiding doing X casts and instead just do a single
>> one on the predicate, in addition we could be doing the cast at the
>> Analysis phase and cut the run short before any work even
>> starts rather than doing a perhaps meaningless comparison between a date
>> and a non-date string.
>> I think we should seriously consider whether in cases like this we should
>> attempt to cast the literal rather than casting the
>> source column.
>> Please let me know if anyone has thoughts on this, or has some previous
>> Jiras I could dig into if it's been discussed before,
>> Russ

View raw message