sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Venkat Ranganathan <vranganat...@hortonworks.com>
Subject Re: How does sqoop export detect Avro schema?
Date Mon, 02 Mar 2015 02:13:07 GMT
If you are exporting from a hdfs location the default is text file format.   I don’t think
we support export to any other format than text using the —export-dir option (on the import
side you could use —as-avrodatafile for avro like other files).

But you can use the —hcatalog-table option to export a hive table without worrying about
the storage location or format.

Venkat

From: Mark Grover
Reply-To: "user@sqoop.apache.org<mailto:user@sqoop.apache.org>"
Date: Sunday, March 1, 2015 at 4:30 PM
To: "user@sqoop.apache.org<mailto:user@sqoop.apache.org>"
Subject: Re: How does sqoop export detect Avro schema?

Forgot to mention, here's the error I am getting:
https://gist.github.com/markgrover/113196fecd1ec5bd0b38

And, please include me on cc. I am not on the list. Thanks again!

On Sun, Mar 1, 2015 at 4:29 PM, Mark Grover <mark@apache.org<mailto:mark@apache.org>>
wrote:
Hi Sqoop folks,
I am trying to better understand how sqoop export works.

In the sqoop export command, we don't put any information about the metadata of the HDFS data
being exported. So, how does sqoop figure out the avro schema of the data being exported?

Does it use Kite's .metadata directory for this? If so, that'd mean you can't export data
not populated by Kite. I don't think that's the case.
Does it parse our the file header or look at file extensions? If so, that doesn't work, I
just populated an hive table which stores data in avro, and it's file extension is not avro.
Does it do something else that I am missing?

I created a Hive avro table using some new syntax supported in Hive 0.14+:

CREATE EXTERNAL TABLE avg_movie_rating2(movie_id INT, rating DOUBLE)
STORED AS AVRO
LOCATION '/data/movielens/aggregated_ratings'

And, I just haven't been able to get Sqoop to be able to export that data. Here's the sqoop
export command that I ran:

sqoop export --connect \
jdbc:mysql://mgrover-haa-2.vpc.cloudera.com:3306/movie_dwh<http://mgrover-haa-2.vpc.cloudera.com:3306/movie_dwh>
\
--username root --table avg_movie_rating --export-dir \ /data/movielens/aggregated_ratings
-m 16 \
--update-key movie_id --update-mode allowinsert

Any thoughts/insights would be much appreciated!

Thanks!
Mark

Mime
View raw message