spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eyal Farago (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-28304) FileFormatWriter introduces an uncoditional sort, even when all attributes are constants
Date Mon, 08 Jul 2019 14:26:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-28304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880402#comment-16880402
] 

Eyal Farago commented on SPARK-28304:
-------------------------------------

cc [~cloud_fan],[~hvanhovell]

> FileFormatWriter introduces an uncoditional sort, even when all attributes are constants
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-28304
>                 URL: https://issues.apache.org/jira/browse/SPARK-28304
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.2
>            Reporter: Eyal Farago
>            Priority: Major
>              Labels: performance
>
> FileFormatWriter derives a required sort order based on the partition columns, bucketing
columns and explicitly required ordering. However in some use cases Some (or even all) of
these fields are constant, in these cases the sort can be skipped.
> i.e. in my use-case, we add a GUUID column identifying a specific (incremental) load,
this can be thought of as a batch id. Since we run one batch at a time, this column is always
a constant which means there's no need to sort based on this column, since we don't use bucketing
or require an explicit ordering the entire sort can be skipped for our case.
>  
> I suggest:
>  # filter away constant columns from the required ordering calculated by FileFormatWriter 
>  # generalizing this to any Sort operator in a spark plan.
>  # introduce optimizer rules to remove constants from sort ordering, potentially eliminating
the sort operator altogether.
>  # modify EnsureRequirements to be aware of constant field when deciding whether to
introduce a sort or not. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message