spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Bhatnagar <aniket.bhatna...@gmail.com>
Subject Re: Issue with Parquet on Spark 1.2 and Amazon EMR
Date Mon, 05 Jan 2015 06:51:05 GMT
Can you confirm your emr version? Could it be because of the classpath
entries for emrfs? You might face issues with using S3 without them.

Thanks,
Aniket

On Mon, Jan 5, 2015, 11:16 AM Adam Gilmore <dragoncurve@gmail.com> wrote:

> Just an update on this - I found that the script by Amazon was the culprit
> - not exactly sure why.  When I installed Spark manually onto the EMR (and
> did the manual configuration of all the EMR stuff), it worked fine.
>
> On Mon, Dec 22, 2014 at 11:37 AM, Adam Gilmore <dragoncurve@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I've just launched a new Amazon EMR cluster and used the script at:
>>
>> s3://support.elasticmapreduce/spark/install-spark
>>
>> to install Spark (this script was upgraded to support 1.2).
>>
>> I know there are tools to launch a Spark cluster in EC2, but I want to
>> use EMR.
>>
>> Everything installs fine; however, when I go to read from a Parquet file,
>> I end up with (the main exception):
>>
>> Caused by: java.lang.NoSuchMethodError:
>> parquet.hadoop.ParquetInputSplit.<init>(Lorg/apache/hadoop/fs/Path;JJJ[Ljava/lang/String;[JLjava/lang/String;Ljava/util/Map;)V
>>         at
>> parquet.hadoop.TaskSideMetadataSplitStrategy.generateTaskSideMDSplits(ParquetInputFormat.java:578)
>>         ... 55 more
>>
>> It seems to me like a version mismatch somewhere.  Where is the
>> parquet-hadoop jar coming from?  Is it built into a fat jar for Spark?
>>
>> Any help would be appreciated.  Note that 1.1.1 worked fine with Parquet
>> files.
>>
>
>

Mime
View raw message