spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wim Van Leuven <wim.vanleu...@highestpoint.biz>
Subject Re: Why spark-submit works with package not with jar
Date Wed, 21 Oct 2020 05:34:19 GMT
Sean,

Problem with the -packages is that in enterprise settings security might
not allow the data environment to link to the internet or even the internal
proxying artefect repository.

Also, wasn't uberjars an antipattern? For some reason I don't like them...

Kind regards
-wim



On Wed, 21 Oct 2020 at 01:06, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Thanks again all.
>
> Anyway as Nicola suggested I used the trench war approach to sort this out
> by just using jars and working out their dependencies in ~/.ivy2/jars
> directory using grep -lRi <missing> :)
>
>
> This now works with just using jars (new added ones in grey) after
> resolving the dependencies
>
>
> ${SPARK_HOME}/bin/spark-submit \
>
>                 --master yarn \
>
>                 --deploy-mode client \
>
>                 --conf spark.executor.memoryOverhead=3000 \
>
>                 --class org.apache.spark.repl.Main \
>
>                 --name "my own Spark shell on Yarn" "$@" \
>
>                 --driver-class-path /home/hduser/jars/ddhybrid.jar \
>
>                 --jars /home/hduser/jars/spark-bigquery-latest.jar, \
>
>                        /home/hduser/jars/ddhybrid.jar, \
>
>
>  /home/hduser/jars/com.google.http-client_google-http-client-1.24.1.jar, \
>
>
>  /home/hduser/jars/com.google.http-client_google-http-client-jackson2-1.24.1.jar,
> \
>
>
>  /home/hduser/jars/com.google.cloud.bigdataoss_util-1.9.4.jar, \
>
>
>  /home/hduser/jars/com.google.api-client_google-api-client-1.24.1.jar, \
>
>
> /home/hduser/jars/com.google.oauth-client_google-oauth-client-1.24.1.jar, \
>
>
>  /home/hduser/jars/com.google.apis_google-api-services-bigquery-v2-rev398-1.24.1.jar,
> \
>
>
>  /home/hduser/jars/com.google.cloud.bigdataoss_bigquery-connector-0.13.4-hadoop2.jar,
> \
>
>                        /home/hduser/jars/spark-bigquery_2.11-0.2.6.jar \
>
>
> Compared to using the package itself as before
>
>
> ${SPARK_HOME}/bin/spark-submit \
>
>                 --master yarn \
>
>                 --deploy-mode client \
>
>                 --conf spark.executor.memoryOverhead=3000 \
>
>                 --class org.apache.spark.repl.Main \
>
>                 --name "my own Spark shell on Yarn" "$@" \
>
>                 --driver-class-path /home/hduser/jars/ddhybrid.jar \
>
>                 --jars /home/hduser/jars/spark-bigquery-latest.jar, \
>
>                        /home/hduser/jars/ddhybrid.jar \
>
>
>                 --packages com.github.samelamin:spark-bigquery_2.11:0.2.6
>
>
>
> I think as Sean suggested this approach may or may not work (a manual
> process) and if jars change, the whole thing has to be re-evaluated adding
> to the complexity.
>
>
> Cheers
>
>
> On Tue, 20 Oct 2020 at 23:01, Sean Owen <srowen@gmail.com> wrote:
>
>> Rather, let --packages (via Ivy) worry about them, because they tell Ivy
>> what they need.
>> There's no 100% guarantee that conflicting dependencies are resolved in a
>> way that works in every single case, which you run into sometimes when
>> using incompatible libraries, but yes this is the point of --packages and
>> Ivy.
>>
>> On Tue, Oct 20, 2020 at 4:43 PM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Thanks again all.
>>>
>>> Hi Sean,
>>>
>>> As I understood from your statement, you are suggesting just use
>>> --packages without worrying about individual jar dependencies?
>>>
>>>>
>>>>>>

Mime
View raw message