spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aureliano Buendia <buendia...@gmail.com>
Subject Re: Spark context jar confusions
Date Sun, 05 Jan 2014 14:25:57 GMT
Eugen, I noticed that you are including hadoop in your fat jar:

<include>org.apache.hadoop:*</include>

This would take a big chunk of the fat jar. Isn't this jar already included
in spark?


On Thu, Jan 2, 2014 at 11:38 AM, Eugen Cepoi <cepoi.eugen@gmail.com> wrote:

> It depends how you deploy, I don't find it so complicated...
>
> 1) To build the fat jar I am using maven (as I am not familiar with sbt).
>
> Inside I have something like that, saying which libs should be used in the
> fat jar (the others won't be present in the final artifact).
>
> <plugin>
>                 <groupId>org.apache.maven.plugins</groupId>
>                 <artifactId>maven-shade-plugin</artifactId>
>                 <version>2.1</version>
>                 <executions>
>                     <execution>
>                         <phase>package</phase>
>                         <goals>
>                             <goal>shade</goal>
>                         </goals>
>                         <configuration>
>                             <minimizeJar>true</minimizeJar>
>
> <createDependencyReducedPom>false</createDependencyReducedPom>
>                             <artifactSet>
>                                 <includes>
>                                     <include>org.apache.hbase:*</include>
>                                     <include>org.apache.hadoop:*</include>
>                                     <include>com.typesafe:config</include>
>                                     <include>org.apache.avro:*</include>
>                                     <include>joda-time:*</include>
>                                     <include>org.joda:*</include>
>                                 </includes>
>                             </artifactSet>
>                             <filters>
>                                 <filter>
>                                     <artifact>*:*</artifact>
>                                     <excludes>
>                                         <exclude>META-INF/*.SF</exclude>
>                                         <exclude>META-INF/*.DSA</exclude>
>                                         <exclude>META-INF/*.RSA</exclude>
>                                     </excludes>
>                                 </filter>
>                             </filters>
>                         </configuration>
>                     </execution>
>                 </executions>
>             </plugin>
>
>
> 2) The App is the jar you have built, so you ship it to the driver node
> (it depends a lot on how you are planing to use it, debian packaging, a
> plain old scp, etc) to run it you can do something like:
>
> $SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar com.myproject.MyJob
>
> where MyJob is the entry point to your job it defines a main method.
>
> 3) I don't know whats the "common way" but I am doing things this way:
> build the fat jar, provide some launch scripts, make debian packaging, ship
> it to a node that plays the role of the driver, run it over mesos using the
> launch scripts + some conf.
>
>
> 2014/1/2 Aureliano Buendia <buendia360@gmail.com>
>
>> I wasn't aware of jarOfClass. I wish there was only one good way of
>> deploying in spark, instead of many ambiguous methods. (seems like spark
>> has followed scala in that there are more than one way of accomplishing a
>> job, making scala an overcomplicated language)
>>
>> 1. Should sbt assembly be used to make the fat jar? If so, which sbt
>> should be used? My local sbt or that $SPARK_HOME/sbt/sbt? Why is that spark
>> is shipped with a separate sbt?
>>
>> 2. Let's say we have the dependencies fat jar which is supposed to be
>> shipped to the workers. Now how do we deploy the main app which is supposed
>> to be executed on the driver? Make jar another jar out of it? Does sbt
>> assembly also create that jar?
>>
>> 3. Is calling sc.jarOfClass() the most common way of doing this? I cannot
>> find any example by googling. What's the most common way that people use?
>>
>>
>>
>> On Thu, Jan 2, 2014 at 10:58 AM, Eugen Cepoi <cepoi.eugen@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> This is the list of the jars you use in your job, the driver will send
>>> all those jars to each worker (otherwise the workers won't have the classes
>>> you need in your job). The easy way to go is to build a fat jar with your
>>> code and all the libs you depend on and then use this utility to get the
>>> path: SparkContext.jarOfClass(YourJob.getClass)
>>>
>>>
>>> 2014/1/2 Aureliano Buendia <buendia360@gmail.com>
>>>
>>>> Hi,
>>>>
>>>> I do not understand why spark context has an option for loading jars at
>>>> runtime.
>>>>
>>>> As an example, consider this<https://github.com/apache/incubator-spark/blob/50fd8d98c00f7db6aa34183705c9269098c62486/examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala#L36>
>>>> :
>>>>
>>>> object BroadcastTest {
>>>>   def main(args: Array[String]) {
>>>>
>>>>
>>>>
>>>>   val sc = new SparkContext(args(0), "Broadcast Test",
>>>>
>>>>
>>>>
>>>>       System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_EXAMPLES_JAR")))
>>>>
>>>>
>>>>
>>>>  }
>>>> }
>>>>
>>>>
>>>> This is *the* example, or *the* application that we want to run, what does
SPARK_EXAMPLES_JAR supposed to be?
>>>> In this particular case, the BroadcastTest example is self-contained, why
would it want to load other unrelated example jars?
>>>>
>>>>
>>>>
>>>>
>>>> Finally, how does this help a real world spark application?
>>>>
>>>>
>>>
>>
>

Mime
View raw message