sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szabolcs Vasas <vasas.szabo...@gmail.com>
Subject Re: Review Request 67929: Remove Kite dependency from the Sqoop project
Date Wed, 18 Jul 2018 12:11:07 GMT


> On July 18, 2018, 9:52 a.m., daniel voros wrote:
> > Hi!
> > 
> > I was trying to run this on a minicluster but got the following error:
> > 
> > ```
> > 2018-07-18 09:20:41,799 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running
child : java.lang.NoSuchMethodError: org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
> >         at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178)
> >         at org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
> >         at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
> >         at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
> >         at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
> >         at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
> >         at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115)
> >         at org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117)
> >         at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:389)
> >         at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:350)
> >         at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:653)
> >         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> >         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:422)
> >         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
> >         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
> > ```
> > 
> > This is happening when we have newer version of parquet (1.8.1 IIRC) with older
Avro (1.7.7 in this case).
> > 
> > Where is parquet coming from?
> >   - 1.9 is coming from Sqoop since this new patch
> >   - Hive's hive-exec jar also contains parquet classes shaded with the original
packaging
> > 
> > Which gets picked seems to be random to me (even changing between reexecution of
mappers!). Both are in the distributed cache.
> > 
> > Where is avro coming from?
> >   - There can be multiple versions under Sqoop/Hive but it doesn't really matter.
Hadoop is packaged with avro under `share/hadoop/*/lib`. The jars there will take precedence
over user classpath. This can be changed with `mapreduce.job.user.classpath.first=true`, but
then we'd have to make sure not to override anything that Hadoop relies on.
> > 
> > I've come across this issue before and solved it with shading parquet classes. Note
that this could be harder to do with Sqoop's ant build scripts.
> > 
> > Some other minor observations:
> >   - Hadoop 3.1.0 still has Avro 1.7.7
> >   - Hive has been using incompatible versions of Avro and Parquet for a long time,
but they're not relying on parts of Parquet that require Avro.
> > 
> > Szabolcs, I've been struggling this for too long, and a fresh pair of eyes might
help spot some other options! Can you please take a look and validate what I've found?
> > 
> > Regards,
> > Daniel

Hi Dani,

Thanks for looking into this! 

What is this minicluster environment you are referring to, how can I set it up on my side?

I have taken a quick look at the dependencies and I can see that Hive references Parquet 1.6
so that might cause an issue.
We can change this patch to keep the parquet-avro 1.6.0 dependency (which was brought in by
Kite earlier) so we would be in-line with the Hive dependencies and later with the Hadoop
3/Hive 3 upgrade we could take a look how we could upgrade the Parquet dependency.

At this point we do not require Parquet 1.9, I have just added it since it a quite recent
version but there is nothing in the patch which relies on it.

I will upload the graphml dependency files for reference.


- Szabolcs


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206195
-----------------------------------------------------------


On July 16, 2018, 3:56 p.m., Szabolcs Vasas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> -----------------------------------------------------------
> 
> (Updated July 16, 2018, 3:56 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3329
>     https://issues.apache.org/jira/browse/SQOOP-3329
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added org.apache.parquet.avro-parquet
1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so I needed
to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These scenarios are
already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
> 
> 
> Diffs
> -----
> 
>   ivy.xml 1f587f3eb 
>   ivy/libraries.properties 565a8bf50 
>   src/docs/user/hive-notes.txt af97d94b3 
>   src/docs/user/import.txt a2c16d956 
>   src/java/org/apache/sqoop/SqoopOptions.java cc1b75281 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a 
>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java
050c85488 
>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java
2180cc20e 
>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java
90b910a34 
>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java
66ebc5b80 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 02816d77f

>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java
6ebc5a31b 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 122ff3fc9

>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java
7e179a27d 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 0a91e4a20

>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java
bd07c09f4 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java
ed045cd14 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java a4768c932 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987 
>   src/test/org/apache/sqoop/TestMerge.java 2b3280a5a 
>   src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c 
>   src/test/org/apache/sqoop/TestParquetImport.java b1488e8af 
>   src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc11 
>   src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512 
>   src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f 
>   src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4 
>   src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10a 
> 
> 
> Diff: https://reviews.apache.org/r/67929/diff/1/
> 
> 
> Testing
> -------
> 
> Ran unit and third party tests.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message