sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Kemper (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SQOOP-2874) Highlight Sqoop import with --as-parquetfile use cases (Dataset name <NAME> is not alphanumeric (plus '_'))
Date Wed, 03 Aug 2016 13:07:20 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Markus Kemper reassigned SQOOP-2874:
------------------------------------

    Assignee: Markus Kemper

> Highlight Sqoop import with --as-parquetfile use cases (Dataset name <NAME> is
not alphanumeric (plus '_'))
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-2874
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2874
>             Project: Sqoop
>          Issue Type: Improvement
>          Components: docs
>            Reporter: Markus Kemper
>            Assignee: Markus Kemper
>         Attachments: Jira_SQOOP-2874_TestCases.txt
>
>
> Hello Sqoop Community,
> Would it be possible to request some documentation enhancements?
> The ask is here is to proactively help raise awareness and improve user experience with
a few specific use cases [1] where some Sqoop commands have restricted character options when
using import with --as-parquetfile.  
> My understanding is Sqoop1 currently relies on Kite Datasets to write Parquet files.
 From the Kite documentation [3] we see that to ensure compatibility (with Hive, etc.), Kite
imposes some restrictions on Names and Namespaces which bubble up in Sqoop.
> The following Sqoop use cases when using import with --as-parquetfile result in the error
[2] below.  Full tests cases for each scenario are attached.  If it is an option to enhance
the Sqoop documentation for these use cases I am happy to provide proposed changes, let me
know.
> [1] Use Cases:
> 1. sqoop import --as-parquetfile + --target-dir /<path>/<rdbms_database>.<table>
> 1.1. The '.' is not allowed
> 2. sqoop import --as-parquetfile + --table <rdbms_database>.<table>  + (no
--target-dir)
> 2.1. The '.' is not allowed, this is essentially the same as (1)
> 3. sqoop import --as-parquetfile + --hive-import --table <hive_database>.<table>

> 3.1. The proper usage is to use --hive-database with --hive-table however with --as-textfile
--hive-table works with <hive_database>.<table>
> [2] Kite Error:
> 16/03/06 08:45:56 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException:
Dataset name DATABASE.TABLE is not alphanumeric (plus '_')
> org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not alphanumeric
(plus '_')
> 	at org.kitesdk.data.ValidationException.check(ValidationException.java:55)
> 	at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105)
> 	at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68)
> 	at org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
> 	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
> 	at org.kitesdk.data.Datasets.create(Datasets.java:239)
> 	at org.kitesdk.data.Datasets.create(Datasets.java:307)
> 	at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:141)
> 	at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:119)
> 	at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130)
> 	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
> 	at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
> 	at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444)
> 	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
> 	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
> 	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
> 	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
> 	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
> 	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
> [3] Kite Documenation:
> http://kitesdk.org/docs/1.0.0/introduction-to-datasets.html
> Names and Namespaces
> URIs also define a name and namespace for your dataset. Kite uses these values when the
underlying system has the same concept (for example, Hive). The name and namespace are typically
the last two values in a URI. For example, if you create a dataset using the URI dataset:hive:fact_tables/ratings,
Kite stores a Hive table ratings in the fact_tables Hive database. If you create a dataset
using the URI dataset:hdfs:/user/cloudera/fact_tables/ratings, Kite stores an HDFS dataset
named ratings in the fact_tables namespace.  To ensure compatibility with Hive and other underlying
systems, names and namespaces in URIs must be made of alphanumeric or underscore (_) characters
and cannot start with a number.
> Thanks, Markus



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message