sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruslan Dautkhanov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-2907) Export parquet files to RDBMS: don't require .metadata for parquet files
Date Mon, 17 Oct 2016 22:41:58 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583729#comment-15583729
] 

Ruslan Dautkhanov commented on SQOOP-2907:
------------------------------------------

Any workarounds for this?
Is any way we can generate from parquet schema a .metadata that KiteSDK/ Sqoop expects?

I was trying to do
# beeline 'create table avro_table stored as avro as select * from parquet_table where 1=0'
# $ hadoop fs -get /hivewarehouse/avro_table/000000_0 ./
# $ avroavro-tools getschema /hivewarehouse/avro_table/000000_0 >000000_0.schema
# $ kite-dataset -v create amf_trans -s 000000_0.schema

The latest command of this four-steps process had finally .metadata directory, but once I
tried to run sqoop export, got following exception:

{noformat}
16/10/17 16:10:25 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetIOException:
Unable to load descriptor file:hdfs://epsdatalake/hivewarehouse/disc_dv.db/amf_trans_dv_09142016/.metadata/descriptor.properties
for dataset:amf_trans_dv_09142016
org.kitesdk.data.DatasetIOException: Unable to load descriptor file:hdfs://epsdatalake/hivewarehouse/disc_dv.db/amf_trans_dv_09142016/.metadata/descriptor.properties
for dataset:amf_trans_dv_09142016
        at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:127)
        at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
        at org.kitesdk.data.Datasets.load(Datasets.java:108)
        at org.kitesdk.data.Datasets.load(Datasets.java:140)
        at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
        at org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
        at org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:84)
        at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:424)
        at org.apache.sqoop.manager.oracle.OraOopConnManager.exportTable(OraOopConnManager.java:320)
        at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
        at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
{noformat}
\\
sqoop export's kiteSDK looks for *.metadata/descriptor.properties* file, 
but what kite-dataset utility generates has only *.metadata/schemas/1.asvc*

The process has to be repeatable / scriptable that's why we were looking at different options
to generate .metadata automatically,
including using kite-dataset commands. It would be awesome if sqoop would generate .metadata
that KiteSDK expects, if .metadata not found.

> Export parquet files to RDBMS: don't require .metadata for parquet files
> ------------------------------------------------------------------------
>
>                 Key: SQOOP-2907
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2907
>             Project: Sqoop
>          Issue Type: Improvement
>          Components: metastore
>    Affects Versions: 1.4.6
>         Environment: sqoop 1.4.6
> export parquet files to Oracle
>            Reporter: Ruslan Dautkhanov
>
> Kite currently requires .metadata.
> Parquet files have their own metadata stored along data files.
> It would be great for Export operation on parquet files to RDBMS not to require .metadata.
> We have most of the files created by Spark and Hive, and they don't create .metadata,
it only Kite that does.
> It makes sqoop export of parquet files usability very limited.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message