spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-30427) Add config item for limiting partition number when calculating statistics through File System
Date Mon, 16 Mar 2020 22:54:06 GMT

     [ https://issues.apache.org/jira/browse/SPARK-30427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dongjoon Hyun updated SPARK-30427:
----------------------------------
    Affects Version/s:     (was: 3.0.0)
                       3.1.0

> Add config item for limiting partition number when calculating statistics through File
System
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30427
>                 URL: https://issues.apache.org/jira/browse/SPARK-30427
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Hu Fuwang
>            Priority: Major
>
> Currently, when spark need to calculate the statistics (eg. sizeInBytes) of table partition
through file system (eg. HDFS), it does not consider the number of partitions. Then if the
the number of partitions is huge, it will cost much time to calculate the statistics which
may be not be that useful.
> It should be reasonable to add a config item to control the limit of partition number
allowable to calculate statistics through file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message