sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: How does sqoop export detect Avro schema?
Date Mon, 02 Mar 2015 21:26:42 GMT
I'm helping Mark troubleshoot here :)

Looks like Sqoop incorrectly tries to use the TextExportMapper instead
of AvroExportMapper even though the file starts with "Obj":

2015-03-02 13:14:16,034 ERROR [main]
org.apache.sqoop.mapreduce.TextExportMapper:
2015-03-02 13:14:16,034 ERROR [main]
org.apache.sqoop.mapreduce.TextExportMapper: Exception raised during
data export
2015-03-02 13:14:16,034 ERROR [main]
org.apache.sqoop.mapreduce.TextExportMapper:
2015-03-02 13:14:16,035 ERROR [main]
org.apache.sqoop.mapreduce.TextExportMapper: Exception:
java.lang.RuntimeException: Can't parse input data:
'Objavro.schema�{"type":"record"'
at avg_movie_rating.__loadFromFields(avg_movie_rating.java:249)
at avg_movie_rating.parse(avg_movie_rating.java:192)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.NumberFormatException: For input string:
"Objavro.schema�{"type":"record""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.valueOf(Integer.java:582)
at avg_movie_rating.__loadFromFields(avg_movie_rating.java:241)
... 12 more

On Mon, Mar 2, 2015 at 10:53 AM, Venkat Ranganathan
<vranganathan@hortonworks.com> wrote:
> You are right. Sqoop has improved support for Avro export files from the
> beginning.  Sorry I overlooked - it was a while since I changed these for
> Hcatalog support!
>
> Generally we should be able to support export of all formats (there is of
> course bug that still needs fixing for Parquet in Hive  and may be for
> some other formats) in a s storage location and format agnostic way with
> Hcatlaog
>
>
> Venkat
>
> On 3/2/15, 9:30 AM, "Gwen Shapira" <gshapira@cloudera.com> wrote:
>
>>I'm pretty sure we support Avro exports, otherwise we wouldn't have
>>JIRAs like this:
>>
>>https://issues.apache.org/jira/browse/SQOOP-1283
>>
>>On Sun, Mar 1, 2015 at 6:13 PM, Venkat Ranganathan
>><vranganathan@hortonworks.com> wrote:
>>> If you are exporting from a hdfs location the default is text file
>>>format.
>>> I don’t think we support export to any other format than text using the
>>> —export-dir option (on the import side you could use —as-avrodatafile
>>>for
>>> avro like other files).
>>>
>>> But you can use the —hcatalog-table option to export a hive table
>>>without
>>> worrying about the storage location or format.
>>>
>>> Venkat
>>>
>>> From: Mark Grover
>>> Reply-To: "user@sqoop.apache.org"
>>> Date: Sunday, March 1, 2015 at 4:30 PM
>>> To: "user@sqoop.apache.org"
>>> Subject: Re: How does sqoop export detect Avro schema?
>>>
>>> Forgot to mention, here's the error I am getting:
>>> https://gist.github.com/markgrover/113196fecd1ec5bd0b38
>>>
>>> And, please include me on cc. I am not on the list. Thanks again!
>>>
>>> On Sun, Mar 1, 2015 at 4:29 PM, Mark Grover <mark@apache.org> wrote:
>>>>
>>>> Hi Sqoop folks,
>>>> I am trying to better understand how sqoop export works.
>>>>
>>>> In the sqoop export command, we don't put any information about the
>>>> metadata of the HDFS data being exported. So, how does sqoop figure
>>>>out the
>>>> avro schema of the data being exported?
>>>>
>>>> Does it use Kite's .metadata directory for this? If so, that'd mean you
>>>> can't export data not populated by Kite. I don't think that's the case.
>>>> Does it parse our the file header or look at file extensions? If so,
>>>>that
>>>> doesn't work, I just populated an hive table which stores data in
>>>>avro, and
>>>> it's file extension is not avro.
>>>> Does it do something else that I am missing?
>>>>
>>>> I created a Hive avro table using some new syntax supported in Hive
>>>>0.14+:
>>>>
>>>> CREATE EXTERNAL TABLE avg_movie_rating2(movie_id INT, rating DOUBLE)
>>>> STORED AS AVRO
>>>> LOCATION '/data/movielens/aggregated_ratings'
>>>>
>>>> And, I just haven't been able to get Sqoop to be able to export that
>>>>data.
>>>> Here's the sqoop export command that I ran:
>>>>
>>>> sqoop export --connect \
>>>> jdbc:mysql://mgrover-haa-2.vpc.cloudera.com:3306/movie_dwh \
>>>> --username root --table avg_movie_rating --export-dir \
>>>> /data/movielens/aggregated_ratings -m 16 \
>>>> --update-key movie_id --update-mode allowinsert
>>>>
>>>> Any thoughts/insights would be much appreciated!
>>>>
>>>> Thanks!
>>>> Mark
>>>
>>>

Mime
View raw message