spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Penny Espinoza <pesp...@societyconsulting.com>
Subject Re: prepending jars to the driver class path for spark-submit on YARN
Date Tue, 09 Sep 2014 20:32:12 GMT
I finally seem to have gotten past this issue.  Here’s what I did:

  *   rather than using the binary distribution, I built Spark from scratch to eliminate the
4.1 version of org.apache.httpcomponents from the assembly
     *   git clone https://github.com/apache/spark.git
     *   cd spark
     *   git checkout v1.0.2
     *   edited pom.xml to remove the modules sql/hive and examples
     *   export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m”
     *   mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
  *   rebuilt my own assembly, eliminating all exclusions I had previously included to force
use of org.apache.httpcomponents 4.1

On Sep 8, 2014, at 12:03 PM, Penny Espinoza <pespino@societyconsulting.com<mailto:pespino@societyconsulting.com>>
wrote:

I have tried using the spark.files.userClassPathFirst option (which, incidentally, is documented
now, but marked as experimental), but it just causes different errors.  I am using spark-streaming-kafka.
 If I mark spark-core and spark-streaming as provided and also exclude them from the spark-streaming-kafka
dependency, I get this error:

14/09/08 18:34:23 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassCastException
java.lang.ClassCastException: cannot assign instance of com.oncue.rna.realtime.streaming.spark.BaseKafkaExtractorJob$$anonfun$getEventsStream$1
to fie
ld org.apache.spark.rdd.MappedRDD.f of type scala.Function1 in instance of org.apache.spark.rdd.MappedRDD
       at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
       at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
       at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
       at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
       at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
       at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:744)


If I mark spark-core and spark-streaming as provided, but do not exclude those from the spark-streaming-kafka
dependency, I get this error:

14/09/08 18:10:26 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassCastException
java.lang.ClassCastException: cannot assign instance of org.apache.spark.storage.StorageLevel
to field org.apache.spark.streaming.receiver.Receiver.storageLevel of type org.apache.spark.storage.StorageLevel
in instance of org.apache.spark.streaming.kafka.KafkaReceiver
       at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
       at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500)
       at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:74)
       at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:147)
       at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
       at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:744)

If I do not mark spark-core and spark-streaming as provided, and also omit the exclusions,
I get the same error I get when they are marked as provided and excluded.





________________________________________
From: Xiangrui Meng <mengxr@gmail.com<mailto:mengxr@gmail.com>>
Sent: Sunday, September 07, 2014 11:40 PM
To: Victor Tso-Guillen
Cc: Penny Espinoza; Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN

There is an undocumented configuration to put users jars in front of
spark jar. But I'm not very certain that it works as expected (and
this is why it is undocumented). Please try turning on
spark.yarn.user.classpath.first . -Xiangrui

On Sat, Sep 6, 2014 at 5:13 PM, Victor Tso-Guillen <vtso@paxata.com<mailto:vtso@paxata.com>>
wrote:
I ran into the same issue. What I did was use maven shade plugin to shade my
version of httpcomponents libraries into another package.


On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza
<pespino@societyconsulting.com<mailto:pespino@societyconsulting.com>> wrote:

Hey - I’m struggling with some dependency issues with
org.apache.httpcomponents httpcore and httpclient when using spark-submit
with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
posts about this issue, but no resolution.

The error message is this:


Caused by: java.lang.NoSuchMethodError:
org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
       at
org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
       at
org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
       at
org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
       at
org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:85)
       at
org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:93)
       at
com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
       at
com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
       at
com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:155)
       at
com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:118)
       at
com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:102)
       at
com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:332)
       at
com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
       at
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.<init>(SchemaRegistry.scala:27)
       at
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
       at
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
       at
com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.<init>(KafkaAvroDecoder.scala:20)
       ... 17 more

The apache httpcomponents libraries include the method above as of version
4.2.  The Spark 1.0.2 binaries seem to include version 4.1.

I can get this to work in my driver program by adding exclusions to force
use of 4.1, but then I get the error in tasks even when using the —jars
option of the spark-submit command.  How can I get both the driver program
and the individual tasks in my spark-streaming job to use the same version
of this library so my job will run all the way through?

thanks
p


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<mailto:user-unsubscribe@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<mailto:user-help@spark.apache.org>



Mime
View raw message