spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <>
Subject [jira] [Commented] (SPARK-12957) Derive and propagate data constrains in logical plan
Date Tue, 14 Feb 2017 17:52:42 GMT


Nick Dimiduk commented on SPARK-12957:

[~sameerag] thanks for the comment. From a naive scan of the tickets, I believe I am seeing
the benefits of SPARK-13871 in that a {{IsNotNull}} constraint is applied from the names of
the join columns. However, I don't see the boon of SPARK-13789, specifically the {{a = 5,
a = b}} mentioned in the description. My query is a join between a very small relation (100's
of rows) and a very large one (10's of billions). I've hinted the planner to broadcast the
smaller table, which it honors. After SPARK-13789, I expected to see the join column values
pushed down as well. This is not the case.

Any tips on debugging this further? I've set breakpoints in the {{RelationProvider}} implementation
and see that it's only receiving the {{IsNotNull}} filters, nothing further from the planner.

Thanks a lot!

> Derive and propagate data constrains in logical plan 
> -----------------------------------------------------
>                 Key: SPARK-12957
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Sameer Agarwal
>         Attachments: ConstraintPropagationinSparkSQL.pdf
> Based on the semantic of a query plan, we can derive data constrains (e.g. if a filter
defines {{a > 10}}, we know that the output data of this filter satisfy the constrain of
{{a > 10}} and {{a is not null}}). We should build a framework to derive and propagate
constrains in the logical plan, which can help us to build more advanced optimizations.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message