spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheng Lian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-6432) Cannot load parquet data with partitions if not all partition columns match data columns
Date Fri, 20 Mar 2015 09:36:38 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371039#comment-14371039
] 

Cheng Lian commented on SPARK-6432:
-----------------------------------

The problem is that, if all partition columns appeared in the path exist in the data files,
it's fine. But if only some of the partition columns exist in the data file, it ends up with
duplicated columns. You case belongs to the first category.

> Cannot load parquet data with partitions if not all partition columns match data columns
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-6432
>                 URL: https://issues.apache.org/jira/browse/SPARK-6432
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0, 1.3.1
>            Reporter: Jianshi Huang
>            Assignee: Cheng Lian
>
> Suppose we have a dataset in the following folder structure:
> {noformat}
> parquet/source=live/date=2015-03-18/
> parquet/source=live/date=2015-03-19/
> ...
> {noformat}
> And the data schema has the following columns:
> - id
> - *event_date*
> - source
> - value
> Where partition key source matches data column source, but partition key date doesn't
match any columns in data.
> Then we cannot load dataset in Spark using parquetFile. It reports:
> {code}
> org.apache.spark.sql.AnalysisException: Ambiguous references to source: (source#2,List()),(source#5,List());
> ...
> {code}
> Currently if partition columns has overlaps with data columns, partition columns have
to be a subset of the data columns.
> Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message