spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-17861) Push data source partitions into metastore for catalog tables
Date Tue, 11 Oct 2016 01:04:20 GMT

     [ https://issues.apache.org/jira/browse/SPARK-17861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reynold Xin updated SPARK-17861:
--------------------------------
    Description: 
Initially, Spark SQL does not store any partition information in the catalog for data source
tables, because initially it was designed to work with arbitrary files. This, however, has
a few issues for catalog tables:

1. Listing partitions for a large table (with millions of partitions) can be very slow during
cold start.
2. Does not support heterogeneous partition naming schemes.
3. Cannot leverage pushing partition pruning into the metastore.

This ticket tracks the work required to push the tracking of partitions into the metastore.
This change should be feature flagged.



  was:
Initially, Spark SQL does not store any partition information in the catalog for data source
tables, because initially it was designed to work with arbitrary files. This, however, has
a few issues for catalog tables:

1. Listing partitions for a large table (with millions of partitions) can be very slow during
cold start.
2. Does not support heterogeneous partition naming schemes.
3. Cannot leverage pushing partition pruning into the metastore.

This ticket tracks the work required to push the tracking of partitions into the metastore.




> Push data source partitions into metastore for catalog tables
> -------------------------------------------------------------
>
>                 Key: SPARK-17861
>                 URL: https://issues.apache.org/jira/browse/SPARK-17861
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Reynold Xin
>            Priority: Critical
>
> Initially, Spark SQL does not store any partition information in the catalog for data
source tables, because initially it was designed to work with arbitrary files. This, however,
has a few issues for catalog tables:
> 1. Listing partitions for a large table (with millions of partitions) can be very slow
during cold start.
> 2. Does not support heterogeneous partition naming schemes.
> 3. Cannot leverage pushing partition pruning into the metastore.
> This ticket tracks the work required to push the tracking of partitions into the metastore.
This change should be feature flagged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message