spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Gilmore <dragoncu...@gmail.com>
Subject Issue with Parquet on Spark 1.2 and Amazon EMR
Date Mon, 22 Dec 2014 01:37:47 GMT
Hi all,

I've just launched a new Amazon EMR cluster and used the script at:

s3://support.elasticmapreduce/spark/install-spark

to install Spark (this script was upgraded to support 1.2).

I know there are tools to launch a Spark cluster in EC2, but I want to use
EMR.

Everything installs fine; however, when I go to read from a Parquet file, I
end up with (the main exception):

Caused by: java.lang.NoSuchMethodError:
parquet.hadoop.ParquetInputSplit.<init>(Lorg/apache/hadoop/fs/Path;JJJ[Ljava/lang/String;[JLjava/lang/String;Ljava/util/Map;)V
        at
parquet.hadoop.TaskSideMetadataSplitStrategy.generateTaskSideMDSplits(ParquetInputFormat.java:578)
        ... 55 more

It seems to me like a version mismatch somewhere.  Where is the
parquet-hadoop jar coming from?  Is it built into a fat jar for Spark?

Any help would be appreciated.  Note that 1.1.1 worked fine with Parquet
files.

Mime
View raw message