spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandeep Singh ...@chandeep.com>
Subject Re: Building Spark packages with SBTor Maven
Date Tue, 15 Mar 2016 13:12:51 GMT
You can build using maven from the command line as well.

This layout should give you an idea and here are some resources - http://www.scala-lang.org/old/node/345
<http://www.scala-lang.org/old/node/345>

project/
   pom.xml   -  Defines the project
   src/
      main/
          java/ - Contains all java code that will go in your final artifact.  
                  See maven-compiler-plugin <http://maven.apache.org/plugins/maven-compiler-plugin/>
for details
          scala/ - Contains all scala code that will go in your final artifact.  
                   See maven-scala-plugin <http://scala-tools.org/mvnsites/maven-scala-plugin/>
for details
          resources/ - Contains all static files that should be available on the classpath

                       in the final artifact.  See maven-resources-plugin <http://maven.apache.org/plugins/maven-resources-plugin/>
for details
          webapp/ - Contains all content for a web application (jsps, css, images, etc.) 

                    See maven-war-plugin <http://maven.apache.org/plugins/maven-war-plugin/>
for details
     site/ - Contains all apt or xdoc files used to create a project website.  
             See maven-site-plugin <http://maven.apache.org/plugins/maven-site-plugin/>
for details       
     test/
         java/ - Contains all java code used for testing.   
                 See maven-compiler-plugin <http://maven.apache.org/plugins/maven-compiler-plugin/>
for details
         scala/ - Contains all scala code used for testing.   
                  See maven-scala-plugin <http://scala-tools.org/mvnsites/maven-scala-plugin/>
for details
         resources/ - Contains all static content that should be available on the 
                      classpath during testing.   See maven-resources-plugin <http://maven.apache.org/plugins/maven-resources-plugin/>
for details


> On Mar 15, 2016, at 12:38 PM, Chandeep Singh <cs@chandeep.com> wrote:
> 
> Do you have the Eclipse Maven plugin setup? http://www.eclipse.org/m2e/ <http://www.eclipse.org/m2e/>
> 
> Once you have it setup, File -> New -> Other -> MavenProject -> Next / Finish.
You’ll see a default POM.xml which you can modify / replace. 
> 
> <PastedGraphic-1.png>
> 
> Here is some documentation that should help: http://scala-ide.org/docs/tutorials/m2eclipse/
<http://scala-ide.org/docs/tutorials/m2eclipse/>
> 
> I’m using the same Eclipse build as you on my Mac. I mostly build a shaded JAR and
SCP it to the cluster.
> 
>> On Mar 15, 2016, at 12:22 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
>> 
>> Great Chandeep. I also have Eclipse Scala IDE below
>> 
>> scala IDE build of Eclipse SDK
>> Build id: 4.3.0-vfinal-2015-12-01T15:55:22Z-Typesafe
>> 
>> I am no expert on Eclipse so if I create project called ImportCSV where do I need
to put the pom file or how do I reference it please. My Eclipse runs on a Linux host so it
cab access all the directories that sbt project accesses? I also believe there will not be
any need for external jar files in builkd path?
>> 
>> Thanks
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 15 March 2016 at 12:15, Chandeep Singh <cs@chandeep.com <mailto:cs@chandeep.com>>
wrote:
>> Btw, just to add to the confusion ;) I use Maven as well since I moved from Java
to Scala but everyone I talk to has been recommending SBT for Scala. 
>> 
>> I use the Eclipse Scala IDE to build. http://scala-ide.org/ <http://scala-ide.org/>
>> 
>> Here is my sample PoM. You can add dependancies based on your requirement.
>> 
>> <project xmlns="http://maven.apache.org/POM/4.0.0 <http://maven.apache.org/POM/4.0.0>"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance <http://www.w3.org/2001/XMLSchema-instance>"
>> 	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 <http://maven.apache.org/POM/4.0.0>
http://maven.apache.org/maven-v4_0_0.xsd <http://maven.apache.org/maven-v4_0_0.xsd>">
>> 	<modelVersion>4.0.0</modelVersion>
>> 	<groupId>spark</groupId>
>> 	<version>1.0</version>
>> 	<name>${project.artifactId}</name>
>> 
>> 	<properties>
>> 		<maven.compiler.source>1.7</maven.compiler.source>
>> 		<maven.compiler.target>1.7</maven.compiler.target>
>> 		<encoding>UTF-8</encoding>
>> 		<scala.version>2.10.4</scala.version>
>> 		<maven-scala-plugin.version>2.15.2</maven-scala-plugin.version>
>> 	</properties>
>> 
>> 	<repositories>
>> 		<repository>
>> 			<id>cloudera-repo-releases</id>
>> 			<url>https://repository.cloudera.com/artifactory/repo/</url> <https://repository.cloudera.com/artifactory/repo/%3C/url%3E>
>> 		</repository>
>> 	</repositories>
>> 
>> 	<dependencies>
>> 		<dependency>
>> 			<groupId>org.scala-lang</groupId>
>> 			<artifactId>scala-library</artifactId>
>> 			<version>${scala.version}</version>
>> 		</dependency>
>> 		<dependency>
>> 			<groupId>org.apache.spark</groupId>
>> 			<artifactId>spark-core_2.10</artifactId>
>> 			<version>1.5.0-cdh5.5.1</version>
>> 		</dependency>
>> 		<dependency>
>> 			<groupId>org.apache.spark</groupId>
>> 			<artifactId>spark-mllib_2.10</artifactId>
>> 			<version>1.5.0-cdh5.5.1</version>
>> 		</dependency>
>> 		<dependency>
>> 			<groupId>org.apache.spark</groupId>
>> 			<artifactId>spark-hive_2.10</artifactId>
>> 			<version>1.5.0</version>
>> 		</dependency>
>> 
>> 	</dependencies>
>> 	<build>
>> 		<sourceDirectory>src/main/scala</sourceDirectory>
>> 		<testSourceDirectory>src/test/scala</testSourceDirectory>
>> 		<plugins>
>> 			<plugin>
>> 				<groupId>org.scala-tools</groupId>
>> 				<artifactId>maven-scala-plugin</artifactId>
>> 				<version>${maven-scala-plugin.version}</version>
>> 				<executions>
>> 					<execution>
>> 						<goals>
>> 							<goal>compile</goal>
>> 							<goal>testCompile</goal>
>> 						</goals>
>> 					</execution>
>> 				</executions>
>> 				<configuration>
>> 					<jvmArgs>
>> 						<jvmArg>-Xms64m</jvmArg>
>> 						<jvmArg>-Xmx1024m</jvmArg>
>> 					</jvmArgs>
>> 				</configuration>
>> 			</plugin>
>> 			<plugin>
>> 				<groupId>org.apache.maven.plugins</groupId>
>> 				<artifactId>maven-shade-plugin</artifactId>
>> 				<version>1.6</version>
>> 				<executions>
>> 					<execution>
>> 						<phase>package</phase>
>> 						<goals>
>> 							<goal>shade</goal>
>> 						</goals>
>> 						<configuration>
>> 							<filters>
>> 								<filter>
>> 									<artifact>*:*</artifact>
>> 									<excludes>
>> 										<exclude>META-INF/*.SF</exclude>
>> 										<exclude>META-INF/*.DSA</exclude>
>> 										<exclude>META-INF/*.RSA</exclude>
>> 									</excludes>
>> 								</filter>
>> 							</filters>
>> 							<transformers>
>> 								<transformer
>> 									implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
>> 									<mainClass>com.group.id.Launcher1</mainClass>
>> 								</transformer>
>> 							</transformers>
>> 						</configuration>
>> 					</execution>
>> 				</executions>
>> 			</plugin>
>> 		</plugins>
>> 	</build>
>> 
>> 	<artifactId>scala</artifactId>
>> </project>
>> 
>> 
>>> On Mar 15, 2016, at 12:09 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
>>> 
>>> Ok.
>>> 
>>> Sounds like opinion is divided :)
>>> 
>>> I will try to build a scala app with Maven.
>>> 
>>> When I build with SBT I follow this directory structure
>>> 
>>> High level directory the package name like
>>> 
>>> ImportCSV
>>> 
>>> under ImportCSV I have a directory src and the sbt file ImportCSV.sbt
>>> 
>>> in directory src I have main and scala subdirectories. My scala file is in
>>> 
>>> ImportCSV/src/main/scala
>>> 
>>> called ImportCSV.scala
>>> 
>>> I then have a shell script that runs everything under ImportCSV directory
>>> 
>>> cat generic.ksh
>>> #!/bin/ksh
>>> #--------------------------------------------------------------------------------
>>> #
>>> # Procedure:    generic.ksh
>>> #
>>> # Description:  Compiles and run scala app usinbg sbt and spark-submit
>>> #
>>> # Parameters:   none
>>> #
>>> #--------------------------------------------------------------------------------
>>> # Vers|  Date  | Who | DA | Description
>>> #-----+--------+-----+----+-----------------------------------------------------
>>> # 1.0 |04/03/15|  MT |    | Initial Version
>>> #--------------------------------------------------------------------------------
>>> #
>>> function F_USAGE
>>> {
>>>    echo "USAGE: ${1##*/} -A '<Application>'"
>>>    echo "USAGE: ${1##*/} -H '<HELP>' -h '<HELP>'"
>>>    exit 10
>>> }
>>> #
>>> # Main Section
>>> #
>>> if [[ "${1}" = "-h" || "${1}" = "-H" ]]; then
>>>    F_USAGE $0
>>> fi
>>> ## MAP INPUT TO VARIABLES
>>> while getopts A: opt
>>> do
>>>    case $opt in
>>>    (A) APPLICATION="$OPTARG" ;;
>>>    (*) F_USAGE $0 ;;
>>>    esac
>>> done
>>> [[ -z ${APPLICATION} ]] && print "You must specify an application value
" && F_USAGE $0
>>> ENVFILE=/home/hduser/dba/bin/environment.ksh
>>> if [[ -f $ENVFILE ]]
>>> then
>>>         . $ENVFILE
>>>         . ~/spark_1.5.2_bin-hadoop2.6.kshrc
>>> else
>>>         echo "Abort: $0 failed. No environment file ( $ENVFILE ) found"
>>>         exit 1
>>> fi
>>> ##FILE_NAME=`basename $0 .ksh`
>>> FILE_NAME=${APPLICATION}
>>> CLASS=`echo ${FILE_NAME}|tr "[:upper:]" "[:lower:]"`
>>> NOW="`date +%Y%m%d_%H%M`"
>>> LOG_FILE=${LOGDIR}/${FILE_NAME}.log
>>> [ -f ${LOG_FILE} ] && rm -f ${LOG_FILE}
>>> print "\n" `date` ", Started $0" | tee -a ${LOG_FILE}
>>> cd ../${FILE_NAME}
>>> print "Compiling ${FILE_NAME}" | tee -a ${LOG_FILE}
>>> sbt package
>>> print "Submiiting the job" | tee -a ${LOG_FILE}
>>> 
>>> ${SPARK_HOME}/bin/spark-submit \
>>>                 --packages com.databricks:spark-csv_2.11:1.3.0 \
>>>                 --class "${FILE_NAME}" \
>>>                 --master spark://50.140.197.217:7077 <http://50.140.197.217:7077/>
\
>>>                 --executor-memory=12G \
>>>                 --executor-cores=12 \
>>>                 --num-executors=2 \
>>>                 target/scala-2.10/${CLASS}_2.10-1.0.jar
>>> print `date` ", Finished $0" | tee -a ${LOG_FILE}
>>> exit
>>> 
>>> 
>>> So to run it for ImportCSV all I need is to do
>>> 
>>> ./generic.ksh -A ImportCSV
>>> 
>>> Now can anyone kindly give me a rough guideline on directory and location of
pom.xml to make this work using maven?
>>> 
>>> Thanks
>>> 
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>  
>>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>>  
>>> 
>>> On 15 March 2016 at 10:50, Sean Owen <sowen@cloudera.com <mailto:sowen@cloudera.com>>
wrote:
>>> FWIW, I strongly prefer Maven over SBT even for Scala projects. The
>>> Spark build of reference is Maven.
>>> 
>>> On Tue, Mar 15, 2016 at 10:45 AM, Chandeep Singh <cs@chandeep.com <mailto:cs@chandeep.com>>
wrote:
>>> > For Scala, SBT is recommended.
>>> >
>>> > On Mar 15, 2016, at 10:42 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com
<mailto:mich.talebzadeh@gmail.com>>
>>> > wrote:
>>> >
>>> > Hi,
>>> >
>>> > I build my Spark/Scala packages using SBT that works fine. I have created
>>> > generic shell scripts to build and submit it.
>>> >
>>> > Yesterday I noticed that some use Maven and Pom for this purpose.
>>> >
>>> > Which approach is recommended?
>>> >
>>> > Thanks,
>>> >
>>> >
>>> > Dr Mich Talebzadeh
>>> >
>>> >
>>> >
>>> > LinkedIn
>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>> >
>>> >
>>> >
>>> > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>> >
>>> >
>>> >
>>> >
>>> 
>> 
>> 
> 


Mime
View raw message