sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Lindholm <greg.lindh...@gmail.com>
Subject Sqoop with Hcat integration on AWS EMR with AWS Glue Data Catalog
Date Thu, 22 Feb 2018 21:24:20 GMT
Has anyone managed to get Sqoop with hcatalog integration working on AWS
EMR when Hive is configured to use AWS Glue Data Catalog?

I'm attempting to import from a MySQL db into Hive on an AWS EMR cluster.
Hive is configured to use AWS Glue Data Catalog as the metadata catalog.

sqoop import \
  -Dmapred.output.direct.NativeS3FileSystem=false \
  -Dmapred.output.direct.EmrFileSystem=false \
  --connect jdbc:mysql://
ec2-18-221-214-250.us-east-2.compute.amazonaws.com:3306/test1 \
  --username XXX -P \
  -m 1 \
  --table sampledata1 \
  --hcatalog-database greg5 \
  --hcatalog-table sampledata1_orc1 \
  --create-hcatalog-table \
  --hcatalog-storage-stanza 'stored as orc'

It appears that the EMR setup wizard properly configures Hive to use the
Glue Data Catalog but not Sqoop.

I had to add the Glue jar to Sqoop:
sudo ln -s
/usr/share/aws/hmclient/lib/aws-glue-datacatalog-hive2-client.jar
/usr/lib/sqoop/lib/aws-glue-datacatalog-hive2-client.jar

When I run the above Sqoop command the table gets created but the import
then fails with and exception saying it can't find the table.

I've checked in Glue (and Hive) and the table is created correctly.

Here is the exception:
18/02/21 20:17:41 INFO conf.HiveConf: Found configuration file
file:/etc/hive/conf.dist/hive-site.xml
18/02/21 20:17:42 INFO common.HiveClientCache: Initializing cache:
eviction-timeout=120 initial-capacity=50 maximum-capacity=50
18/02/21 20:17:42 INFO hive.metastore: Trying to connect to metastore with
URI thrift://ip-172-31-27-114.us-east-2.compute.internal:9083
18/02/21 20:17:42 INFO hive.metastore: Opened a connection to metastore,
current connections: 1
18/02/21 20:17:42 INFO hive.metastore: Connected to metastore.
18/02/21 20:17:43 ERROR tool.ImportTool: Encountered IOException running
import job: java.io.IOException:
NoSuchObjectException(message:greg5.sampledata1_orc1 table not found)
        at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
        at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
        at
org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:343)
        at
org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:783)
        at
org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
        at
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
        at
org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
        at
org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: NoSuchObjectException(message:greg5.sampledata1_orc1 table not
found)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55064)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55032)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result.read(ThriftHiveMetastore.java:54963)
        at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
        at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
        at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1344)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
        at com.sun.proxy.$Proxy5.getTable(Unknown Source)
        at
org.apache.hive.hcatalog.common.HCatUtil.getTable(HCatUtil.java:180)
        at
org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105)
        at
org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:88)
        at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
        ... 15 more

The Hive config file has this property:
/etc/hive/conf.dist/hive-site.xml

<property>
  <name>hive.metastore.client.factory.class</name>

<value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value>
</property>

Does anyone have any suggestions?

/Greg

Mime
View raw message