spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: ClassNotDefException when using spark-submit with multiple jars and files located on HDFS
Date Wed, 10 Jun 2015 07:07:08 GMT
Or you can do sc.addJar(/path/to/the/jar), i haven't tested with HDFS path
though it works fine with local path.

Thanks
Best Regards

On Wed, Jun 10, 2015 at 10:17 AM, Jörn Franke <jornfranke@gmail.com> wrote:

> I am not sure they work with HDFS pathes. You may want to look at the
> source code. Alternatively you can create a "fat" jar containing all jars
> (let your build tool set correctly METAINF). This always works.
>
> Le mer. 10 juin 2015 à 6:22, Dong Lei <donglei@microsoft.com> a écrit :
>
>>  Thanks So much!
>>
>>
>>
>> I did put sleep on my code to have the UI available.
>>
>>
>>
>> Now from the UI, I can see:
>>
>> ·         In the “SparkProperty” Section,  the spark.jars and
>> spark.files are set as what I want.
>>
>> ·         In the “Classpath Entries” Section, my jars and files paths
>> are there(with a HDFS path)
>>
>>
>>
>> And I check the HTTP file server directory, the stuctrue is like:
>>
>>      D:\data\temp
>>
>>                           \ --spark-UUID
>>
>>                                \-- httpd-UUID
>>
>>                                     \jars [*empty*]
>>
>>                                     \files [*empty*]
>>
>>
>>
>> So I guess the files and jars and not properly downloaded from HDFS to
>> these folders?
>>
>>
>>
>> I’m using standalone mode.
>>
>>
>>
>> Any ideas?
>>
>>
>>
>> Thanks
>>
>> Dong Lei
>>
>>
>>
>> *From:* Akhil Das [mailto:akhil@sigmoidanalytics.com]
>> *Sent:* Tuesday, June 9, 2015 4:46 PM
>>
>>
>> *To:* Dong Lei
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: ClassNotDefException when using spark-submit with
>> multiple jars and files located on HDFS
>>
>>
>>
>> You can put a Thread.sleep(100000) in the code to have the UI available
>> for quiet some time. (Put it just before starting any of your
>> transformations) Or you can enable the spark history server
>> <https://spark.apache.org/docs/latest/monitoring.html> too. I believe
>> --jars
>> <https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management>
>> would download the dependency jars on all your worker machines (can be
>> found in spark work dir of your application along with stderr stdout files).
>>
>>
>>   Thanks
>>
>> Best Regards
>>
>>
>>
>> On Tue, Jun 9, 2015 at 1:29 PM, Dong Lei <donglei@microsoft.com> wrote:
>>
>>  Thanks Akhil:
>>
>>
>>
>> The driver fails so fast to get a look at 4040. Is there any other way to
>> see the download and ship process of the files?
>>
>>
>>
>> Is driver supposed to download these jars from HDFS to some location,
>> then ship them to excutors?
>>
>> I can see from log that the driver downloaded the application jar but not
>> the other jars specified by “—jars”.
>>
>>
>>
>> Or I misunderstand the usage of “--jars”, and the jars should be already
>> in every worker, driver will not download them?
>>
>> Is there some useful docs?
>>
>>
>>
>> Thanks
>>
>> Dong Lei
>>
>>
>>
>>
>>
>> *From:* Akhil Das [mailto:akhil@sigmoidanalytics.com]
>> *Sent:* Tuesday, June 9, 2015 3:24 PM
>> *To:* Dong Lei
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: ClassNotDefException when using spark-submit with
>> multiple jars and files located on HDFS
>>
>>
>>
>> Once you submits the application, you can check in the driver UI (running
>> on port 4040) Environment Tab to see whether those jars you added got
>> shipped or not. If they are shipped and still you are getting NoClassDef
>> exceptions then it means that you are having a jar conflict which you can
>> resolve by putting the jar with the class in it on the top of your
>> classpath.
>>
>>
>>   Thanks
>>
>> Best Regards
>>
>>
>>
>> On Tue, Jun 9, 2015 at 9:05 AM, Dong Lei <donglei@microsoft.com> wrote:
>>
>>  Hi, spark-users:
>>
>>
>>
>> I’m using spark-submit to submit multiple jars and files(all in HDFS) to
>> run a job, with the following command:
>>
>>
>>
>> Spark-submit
>>
>>   --class myClass
>>
>>  --master spark://localhost:7077/
>>
>>   --deploy-mode cluster
>>
>>   --jars hdfs://localhost/1.jar, hdfs://localhost/2.jar
>>
>>   --files hdfs://localhost/1.txt, hdfs://localhost/2.txt
>>
>>  hdfs://localhost/main.jar
>>
>>
>>
>> the stderr in the driver showed java.lang.ClassNotDefException for a
>> class in 1.jar.
>>
>>
>>
>> I checked the log that spark has added these jars:
>>
>>      INFO SparkContext: Added JAR hdfs:// …1.jar
>>
>>      INFO SparkContext: Added JAR hdfs:// …2.jar
>>
>>
>>
>> In the folder of the driver, I only saw the main.jar is copied to that
>> place, *but  the other jars and files were not there*
>>
>>
>>
>> Could someone explain *how should I pass the jars and files* needed by
>> the main jar to spark?
>>
>>
>>
>> If my class in main.jar refer to these files with a relative path, *will
>> spark copy these files into one folder*?
>>
>>
>>
>> BTW, my class works in a client mode with all jars and files in local.
>>
>>
>>
>> Thanks
>>
>> Dong Lei
>>
>>
>>
>>
>>
>

Mime
View raw message