spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gaurav Shah (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-17527) mergeSchema with `_OPTIONAL_` metadata fails
Date Fri, 23 Sep 2016 09:32:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15515948#comment-15515948
] 

Gaurav Shah edited comment on SPARK-17527 at 9/23/16 9:31 AM:
--------------------------------------------------------------

I am unable to create a smaller script that can reproduce this issue, although I have data
in production that has the issue.

Somehow while reading the footer of parquet it reads "org.apache.spark.sql.parquet.row.metadata"
with "_OPTIONAL_" which doesn't happen in my sample program. Let me try to get more details.

Would it help if I can send the parquet files ?


was (Author: gaurav24):
I am unable to create a smaller script that can reproduce this issue, although I have data
in production that has the issue.

Somehow while reading the footer of parquet it reads "org.apache.spark.sql.parquet.row.metadata"
with "_OPTIONAL_" which doesn't happen in may sample program. Let me try to get more details.

Would it help if I can send the parquet files ?

> mergeSchema with `_OPTIONAL_` metadata fails
> --------------------------------------------
>
>                 Key: SPARK-17527
>                 URL: https://issues.apache.org/jira/browse/SPARK-17527
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.0
>         Environment: mac osx 10.11.6, ubuntu 14, ubuntu 16.
> spark 2.0.0, spark-catalyst 2.0.0
>            Reporter: Gaurav Shah
>
> Spark added '_OPTIONAL' metadata in 2.0.0 in following commit: https://github.com/apache/spark/commit/4637fc08a3733ec313218fb7e4d05064d9a6262d
> but merging metadata for data created from spark 1.6.x and 2.0 fails with following:
> {code}
> Exception in thread "main" java.lang.RuntimeException: could not merge metadata: key
org.apache.spark.sql.parquet.row.metadata has conflicting values:
> {code}
> and the only difference in those values is metadata now having "_OPTIONAL_" field extra.
> {code:javascript}
> {		            {
>               "name": "catalog",		              "name": "catalog",
>               "type": {		              "type": {
>                 "type": "struct",		                "type": "struct",
>                 "fields": [		                "fields": [
>                   {		                  {
>                     "name": "category",		                    "name": "category",
>                     "type": "string",		                    "type": "string",
>                     "nullable": true,		                    "nullable": true,
>                     "metadata": {}		                    "metadata": {}
>                   },		                  },
>                   {		                  {
>                     "name": "department",		                    "name": "department",
>                     "type": "string",		                    "type": "string",
>                     "nullable": true,		                    "nullable": true,
>                     "metadata": {}		                    "metadata": {}
>                   }		                  }
>                 ]		                ]
>               },		              },
>               "nullable": true,		              "nullable": true,
>               "metadata": {		              "metadata": {}
>                 "_OPTIONAL_": true		
>               }		
> {code}
> vs
> {code:javascript}
> 	            {
>               "name": "catalog",		              "name": "catalog",
>               "type": {		              "type": {
>                 "type": "struct",		                "type": "struct",
>                 "fields": [		                "fields": [
>                   {		                  {
>                     "name": "category",		                    "name": "category",
>                     "type": "string",		                    "type": "string",
>                     "nullable": true,		                    "nullable": true,
>                     "metadata": {}		                    "metadata": {}
>                   },		                  },
>                   {		                  {
>                     "name": "department",		                    "name": "department",
>                     "type": "string",		                    "type": "string",
>                     "nullable": true,		                    "nullable": true,
>                     "metadata": {}		                    "metadata": {}
>                   }		                  }
>                 ]		                ]
>               },		              },
>               "nullable": true,		              "nullable": true,
>               "metadata": {		              "metadata": {}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message