spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Tiernay (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-5302) Add support for SQLContext "partition" columns
Date Sun, 18 Jan 2015 00:30:34 GMT

     [ https://issues.apache.org/jira/browse/SPARK-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bob Tiernay updated SPARK-5302:
-------------------------------
    Description: For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to support
a virtual column that maps to part of the the file path, similar to what is done in Hive for
partitions (e.g. {{/data/clicks/dt=2015-01-01/}}). The API could allow the user to type the
column using an appropriate {{DataType}} instance. This new field could be addressed in SQL
statements much the same as is done in Hive. As a consequence, pruning of partitions could
be possible when executing a query and also remove the need to materialize a column in each
logical partition that is already encoded in the path name. Furthermore, this would provide
an nice interop and migration strategy for Hive users who may one day use {{SQLContext}} directly.
 (was: For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to support a virtual
column that maps to part of the the file path, similar to what is done in Hive for partitions.
The API could allow the user to type the column using an appropriate {{DataType}} instance.
This new field could be addressed in SQL statements much the same as is done in Hive. As a
consequence, pruning of partitions could be possible when executing a query and also remove
the need to materialize a column in each logical partition that is already encoded in the
path name. Furthermore, this would provide an nice interop and migration strategy for Hive
users who may one day use {{SQLContext}} directly.)

> Add support for SQLContext "partition" columns
> ----------------------------------------------
>
>                 Key: SPARK-5302
>                 URL: https://issues.apache.org/jira/browse/SPARK-5302
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Bob Tiernay
>
> For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to support a virtual
column that maps to part of the the file path, similar to what is done in Hive for partitions
(e.g. {{/data/clicks/dt=2015-01-01/}}). The API could allow the user to type the column using
an appropriate {{DataType}} instance. This new field could be addressed in SQL statements
much the same as is done in Hive. As a consequence, pruning of partitions could be possible
when executing a query and also remove the need to materialize a column in each logical partition
that is already encoded in the path name. Furthermore, this would provide an nice interop
and migration strategy for Hive users who may one day use {{SQLContext}} directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message