spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luis Ángel Vicente Sánchez <langel.gro...@gmail.com>
Subject Re: spark streaming, kafka, SPARK_CLASSPATH
Date Mon, 16 Jun 2014 18:49:43 GMT
Did you manage to make it work? I'm facing similar problems and this a
serious blocker issue. spark-submit seems kind of broken to me if you can
use it for spark-streaming.

Regards,

Luis


2014-06-11 1:48 GMT+01:00 lannyripple <lanny.ripple@gmail.com>:

> I am using Spark 1.0.0 compiled with Hadoop 1.2.1.
>
> I have a toy spark-streaming-kafka program.  It reads from a kafka queue
> and
> does
>
>     stream
>       .map {case (k, v) => (v, 1)}
>       .reduceByKey(_ + _)
>       .print()
>
> using a 1 second interval on the stream.
>
> The docs say to make Spark and Hadoop jars 'provided' but this breaks for
> spark-streaming.  Including spark-streaming (and spark-streaming-kafka) as
> 'compile' to sweep them into our assembly gives collisions on javax.*
> classes.  To work around this I modified
> $SPARK_HOME/bin/compute-classpath.sh to include spark-streaming,
> spark-streaming-kafka, and zkclient.  (Note that kafka is included as
> 'compile' in my project and picked up in the assembly.)
>
> I have set up conf/spark-env.sh as needed.  I have copied my assembly to
> /tmp/myjar.jar on all spark hosts and to my hdfs /tmp/jars directory.  I am
> running spark-submit from my spark master.  I am guided by the information
> here https://spark.apache.org/docs/latest/submitting-applications.html
>
> Well at this point I was going to detail all the ways spark-submit fails to
> follow it's own documentation.  If I do not invoke sparkContext.setJars()
> then it just fails to find the driver class.  This is using various
> combinations of absolute path, file:, hdfs: (Warning: Skip remote jar)??,
> and local: prefixes on the application-jar and --jars arguments.
>
> If I invoke sparkContext.setJars() and include my assembly jar I get
> further.  At this point I get a failure from
> kafka.consumer.ConsumerConnector not being found.  I suspect this is
> because
> spark-streaming-kafka needs the Kafka dependency it but my assembly jar is
> too late in the classpath.
>
> At this point I try setting spark.files.userClassPathfirst to 'true' but
> this causes more things to blow up.
>
> I finally found something that works.  Namely setting environment variable
> SPARK_CLASSPATH=/tmp/myjar.jar  But silly me, this is deprecated and I'm
> helpfully informed to
>
>   Please instead use:
>    - ./spark-submit with --driver-class-path to augment the driver
> classpath
>    - spark.executor.extraClassPath to augment the executor classpath
>
> which when put into a file and introduced with --properties-file does not
> work.  (Also tried spark.files.userClassPathFirst here.)  These fail with
> the kafka.consumer.ConsumerConnector error.
>
> At a guess what's going on is that using SPARK_CLASSPATH I have my assembly
> jar in the classpath at SparkSubmit invocation
>
>   Spark Command: java -cp
>
> /tmp/myjar.jar::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.0-hadoop1.2.1.jar:/opt/spark/lib/spark-streaming_2.10-1.0.0.jar:/opt/spark/lib/spark-streaming-kafka_2.10-1.0.0.jar:/opt/spark/lib/zkclient-0.4.jar
> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
> org.apache.spark.deploy.SparkSubmit --class me.KafkaStreamingWC
> /tmp/myjar.jar
>
> but using --properties-file then the assembly is not available for
> SparkSubmit.
>
> I think the root cause is either spark-submit not handling the
> spark-streaming libraries so they can be 'provided' or the inclusion of
> org.elicpse.jetty.orbit in the streaming libraries which cause
>
>   [error] (*:assembly) deduplicate: different file contents found in the
> following:
>   [error]
>
> /Users/lanny/.ivy2/cache/org.eclipse.jetty.orbit/javax.transaction/orbits/javax.transaction-1.1.1.v201105210645.jar:META-INF/ECLIPSEF.RSA
>   [error]
>
> /Users/lanny/.ivy2/cache/org.eclipse.jetty.orbit/javax.servlet/orbits/javax.servlet-3.0.0.v201112011016.jar:META-INF/ECLIPSEF.RSA
>   [error]
>
> /Users/lanny/.ivy2/cache/org.eclipse.jetty.orbit/javax.mail.glassfish/orbits/javax.mail.glassfish-1.4.1.v201005082020.jar:META-INF/ECLIPSEF.RSA
>   [error]
>
> /Users/lanny/.ivy2/cache/org.eclipse.jetty.orbit/javax.activation/orbits/javax.activation-1.1.0.v201105071233.jar:META-INF/ECLIPSEF.RSA
>
> I've tried applying mergeStategy in assembly for my assembly.sbt but then I
> get
>
>   Invalid signature file digest for Manifest main attributes
>
> If anyone knows the magic to get this working a reply would be greatly
> appreciated.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafka-SPARK-CLASSPATH-tp7356.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message