sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Lindholm <greg.lindh...@gmail.com>
Subject Sqoop import with HCatalog on AWS EMR
Date Wed, 14 Feb 2018 21:49:43 GMT
Hi Sqoop Users,

I was attempting to Sqoop import with HCat on an AWS EMR cluster.
I was importing from a MySQL database and writing to a S3 location.

sudo sqoop import \
  --connect jdbc:mysql://xxx.us-east-2.compute.amazonaws.com:3306/test1 \
  --username xxx -P\
  --table sampledata1 \
  --hcatalog-database greg3 \
  --hcatalog-table sampledata1_orc1 \
  --create-hcatalog-table \
  --hcatalog-storage-stanza 'stored as orc'

The database (greg3) was created in hive with a location to an S3 bucket.

The sqoop job would run and succeed but the data file was never being
The table was created correctly in Hive HCatalog and the table folders were
created on S3 but no data file was being written.

I found the solution buried in a page on HCatalog under EMR.

You have to set these mapred config values to "Disable Direct Write When
Using HCatalog HStorer"

  -Dmapred.output.direct.NativeS3FileSystem=false \
  -Dmapred.output.direct.EmrFileSystem=false \

Here is the link:

Hopefully this will save someone else a lot of trouble.


View raw message