spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SLiZn Liu <sliznmail...@gmail.com>
Subject Re: Can Dependencies Be Resolved on Spark Cluster?
Date Wed, 01 Jul 2015 07:37:30 GMT
Thanks for the enlightening solution!

On Wed, Jul 1, 2015 at 12:03 AM Burak Yavuz <brkyvz@gmail.com> wrote:

> Hi,
> In your build.sbt file, all the dependencies you have (hopefully they're
> not too many, they only have a lot of transitive dependencies), for example:
> ```
> libraryDependencies += "org.apache.hbase" % "hbase" % "1.1.1"
>
> libraryDependencies += "junit" % "junit" % "x"
>
> resolvers += "Some other repo" at "http://some.other.repo"
>
> resolvers += "Some other repo2" at "http://some.other.repo2"
> ```
>
> call `sbt package`, and then run spark-submit as:
>
> $ bin/spark-submit --packages org.apache.hbase:hbase:1.1.1, junit:junit:x
> --repositories http://some.other.repo,http://some.other.repo2 $YOUR_JAR
>
> Best,
> Burak
>
>
>
>
>
> On Mon, Jun 29, 2015 at 11:33 PM, SLiZn Liu <sliznmailbox@gmail.com>
> wrote:
>
>> Hi Burak,
>>
>> Is `--package` flag only available for maven, no sbt support?
>>
>> On Tue, Jun 30, 2015 at 2:26 PM Burak Yavuz <brkyvz@gmail.com> wrote:
>>
>>> You can pass `--packages your:comma-separated:maven-dependencies` to
>>> spark submit if you have Spark 1.3 or greater.
>>>
>>> Best regards,
>>> Burak
>>>
>>> On Mon, Jun 29, 2015 at 10:46 PM, SLiZn Liu <sliznmailbox@gmail.com>
>>> wrote:
>>>
>>>> Hey Spark Users,
>>>>
>>>> I'm writing a demo with Spark and HBase. What I've done is packaging a
>>>> **fat jar**: place dependencies in `build.sbt`, and use `sbt assembly` to
>>>> package **all dependencies** into one big jar. The rest work is copy the
>>>> fat jar to Spark master node and then launch by `spark-submit`.
>>>>
>>>> The defect of the "fat jar" fashion is obvious: all dependencies is
>>>> packed, yielding a huge jar file. Even worse, in my case, a vast amount of
>>>> the conflicting package files  in `~/.ivy/cache`fails when merging, I had
>>>> to manually specify `MergingStrategy` as `rename` for all conflicting files
>>>> to bypass this issue.
>>>>
>>>> Then I thought, there should exists an easier way to submit a "thin
>>>> jar" with build.sbt-like file specifying dependencies, and then
>>>> dependencies are automatically resolved across the cluster before the
>>>> actual job is launched. I googled, except nothing related was found. Is
>>>> this plausible, or is there other better ways to achieve the same goal?
>>>>
>>>> BEST REGARDS,
>>>> Todd Leo
>>>>
>>>
>>>
>

Mime
View raw message