spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandeep Singh ...@chandeep.com>
Subject Re: Building Spark packages with SBTor Maven
Date Tue, 15 Mar 2016 13:56:08 GMT
Yes, sbt uses the same structure as maven for source files.

> On Mar 15, 2016, at 1:53 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
> 
> Thanks the maven structure is identical to sbt. just sbt file I will have to replace
with pom.xml
> 
> I will use your pom.xml to start with it.
> 
> Cheers
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 15 March 2016 at 13:12, Chandeep Singh <cs@chandeep.com <mailto:cs@chandeep.com>>
wrote:
> You can build using maven from the command line as well.
> 
> This layout should give you an idea and here are some resources - http://www.scala-lang.org/old/node/345
<http://www.scala-lang.org/old/node/345>
> 
> project/
>    pom.xml   -  Defines the project
>    src/
>       main/
>           java/ - Contains all java code that will go in your final artifact.  
>                   See maven-compiler-plugin <http://maven.apache.org/plugins/maven-compiler-plugin/>
for details
>           scala/ - Contains all scala code that will go in your final artifact.  
>                    See maven-scala-plugin <http://scala-tools.org/mvnsites/maven-scala-plugin/>
for details
>           resources/ - Contains all static files that should be available on the classpath

>                        in the final artifact.  See maven-resources-plugin <http://maven.apache.org/plugins/maven-resources-plugin/>
for details
>           webapp/ - Contains all content for a web application (jsps, css, images, etc.)
 
>                     See maven-war-plugin <http://maven.apache.org/plugins/maven-war-plugin/>
for details
>      site/ - Contains all apt or xdoc files used to create a project website.  
>              See maven-site-plugin <http://maven.apache.org/plugins/maven-site-plugin/>
for details       
>      test/
>          java/ - Contains all java code used for testing.   
>                  See maven-compiler-plugin <http://maven.apache.org/plugins/maven-compiler-plugin/>
for details
>          scala/ - Contains all scala code used for testing.   
>                   See maven-scala-plugin <http://scala-tools.org/mvnsites/maven-scala-plugin/>
for details
>          resources/ - Contains all static content that should be available on the 
>                       classpath during testing.   See maven-resources-plugin <http://maven.apache.org/plugins/maven-resources-plugin/>
for details
> 
> 
>> On Mar 15, 2016, at 12:38 PM, Chandeep Singh <cs@chandeep.com <mailto:cs@chandeep.com>>
wrote:
>> 
>> Do you have the Eclipse Maven plugin setup? http://www.eclipse.org/m2e/ <http://www.eclipse.org/m2e/>
>> 
>> Once you have it setup, File -> New -> Other -> MavenProject -> Next
/ Finish. You’ll see a default POM.xml which you can modify / replace. 
>> 
>> <PastedGraphic-1.png>
>> 
>> Here is some documentation that should help: http://scala-ide.org/docs/tutorials/m2eclipse/
<http://scala-ide.org/docs/tutorials/m2eclipse/>
>> 
>> I’m using the same Eclipse build as you on my Mac. I mostly build a shaded JAR
and SCP it to the cluster.
>> 
>>> On Mar 15, 2016, at 12:22 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
>>> 
>>> Great Chandeep. I also have Eclipse Scala IDE below
>>> 
>>> scala IDE build of Eclipse SDK
>>> Build id: 4.3.0-vfinal-2015-12-01T15:55:22Z-Typesafe
>>> 
>>> I am no expert on Eclipse so if I create project called ImportCSV where do I
need to put the pom file or how do I reference it please. My Eclipse runs on a Linux host
so it cab access all the directories that sbt project accesses? I also believe there will
not be any need for external jar files in builkd path?
>>> 
>>> Thanks
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>  
>>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>>  
>>> 
>>> On 15 March 2016 at 12:15, Chandeep Singh <cs@chandeep.com <mailto:cs@chandeep.com>>
wrote:
>>> Btw, just to add to the confusion ;) I use Maven as well since I moved from Java
to Scala but everyone I talk to has been recommending SBT for Scala. 
>>> 
>>> I use the Eclipse Scala IDE to build. http://scala-ide.org/ <http://scala-ide.org/>
>>> 
>>> Here is my sample PoM. You can add dependancies based on your requirement.
>>> 
>>> <project xmlns="http://maven.apache.org/POM/4.0.0 <http://maven.apache.org/POM/4.0.0>"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance <http://www.w3.org/2001/XMLSchema-instance>"
>>> 	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 <http://maven.apache.org/POM/4.0.0>
http://maven.apache.org/maven-v4_0_0.xsd <http://maven.apache.org/maven-v4_0_0.xsd>">
>>> 	<modelVersion>4.0.0</modelVersion>
>>> 	<groupId>spark</groupId>
>>> 	<version>1.0</version>
>>> 	<name>${project.artifactId}</name>
>>> 
>>> 	<properties>
>>> 		<maven.compiler.source>1.7</maven.compiler.source>
>>> 		<maven.compiler.target>1.7</maven.compiler.target>
>>> 		<encoding>UTF-8</encoding>
>>> 		<scala.version>2.10.4</scala.version>
>>> 		<maven-scala-plugin.version>2.15.2</maven-scala-plugin.version>
>>> 	</properties>
>>> 
>>> 	<repositories>
>>> 		<repository>
>>> 			<id>cloudera-repo-releases</id>
>>> 			<url>https://repository.cloudera.com/artifactory/repo/</url> <https://repository.cloudera.com/artifactory/repo/%3C/url%3E>
>>> 		</repository>
>>> 	</repositories>
>>> 
>>> 	<dependencies>
>>> 		<dependency>
>>> 			<groupId>org.scala-lang</groupId>
>>> 			<artifactId>scala-library</artifactId>
>>> 			<version>${scala.version}</version>
>>> 		</dependency>
>>> 		<dependency>
>>> 			<groupId>org.apache.spark</groupId>
>>> 			<artifactId>spark-core_2.10</artifactId>
>>> 			<version>1.5.0-cdh5.5.1</version>
>>> 		</dependency>
>>> 		<dependency>
>>> 			<groupId>org.apache.spark</groupId>
>>> 			<artifactId>spark-mllib_2.10</artifactId>
>>> 			<version>1.5.0-cdh5.5.1</version>
>>> 		</dependency>
>>> 		<dependency>
>>> 			<groupId>org.apache.spark</groupId>
>>> 			<artifactId>spark-hive_2.10</artifactId>
>>> 			<version>1.5.0</version>
>>> 		</dependency>
>>> 
>>> 	</dependencies>
>>> 	<build>
>>> 		<sourceDirectory>src/main/scala</sourceDirectory>
>>> 		<testSourceDirectory>src/test/scala</testSourceDirectory>
>>> 		<plugins>
>>> 			<plugin>
>>> 				<groupId>org.scala-tools</groupId>
>>> 				<artifactId>maven-scala-plugin</artifactId>
>>> 				<version>${maven-scala-plugin.version}</version>
>>> 				<executions>
>>> 					<execution>
>>> 						<goals>
>>> 							<goal>compile</goal>
>>> 							<goal>testCompile</goal>
>>> 						</goals>
>>> 					</execution>
>>> 				</executions>
>>> 				<configuration>
>>> 					<jvmArgs>
>>> 						<jvmArg>-Xms64m</jvmArg>
>>> 						<jvmArg>-Xmx1024m</jvmArg>
>>> 					</jvmArgs>
>>> 				</configuration>
>>> 			</plugin>
>>> 			<plugin>
>>> 				<groupId>org.apache.maven.plugins</groupId>
>>> 				<artifactId>maven-shade-plugin</artifactId>
>>> 				<version>1.6</version>
>>> 				<executions>
>>> 					<execution>
>>> 						<phase>package</phase>
>>> 						<goals>
>>> 							<goal>shade</goal>
>>> 						</goals>
>>> 						<configuration>
>>> 							<filters>
>>> 								<filter>
>>> 									<artifact>*:*</artifact>
>>> 									<excludes>
>>> 										<exclude>META-INF/*.SF</exclude>
>>> 										<exclude>META-INF/*.DSA</exclude>
>>> 										<exclude>META-INF/*.RSA</exclude>
>>> 									</excludes>
>>> 								</filter>
>>> 							</filters>
>>> 							<transformers>
>>> 								<transformer
>>> 									implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
>>> 									<mainClass>com.group.id.Launcher1</mainClass>
>>> 								</transformer>
>>> 							</transformers>
>>> 						</configuration>
>>> 					</execution>
>>> 				</executions>
>>> 			</plugin>
>>> 		</plugins>
>>> 	</build>
>>> 
>>> 	<artifactId>scala</artifactId>
>>> </project>
>>> 
>>> 
>>>> On Mar 15, 2016, at 12:09 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
<mailto:mich.talebzadeh@gmail.com>> wrote:
>>>> 
>>>> Ok.
>>>> 
>>>> Sounds like opinion is divided :)
>>>> 
>>>> I will try to build a scala app with Maven.
>>>> 
>>>> When I build with SBT I follow this directory structure
>>>> 
>>>> High level directory the package name like
>>>> 
>>>> ImportCSV
>>>> 
>>>> under ImportCSV I have a directory src and the sbt file ImportCSV.sbt
>>>> 
>>>> in directory src I have main and scala subdirectories. My scala file is in
>>>> 
>>>> ImportCSV/src/main/scala
>>>> 
>>>> called ImportCSV.scala
>>>> 
>>>> I then have a shell script that runs everything under ImportCSV directory
>>>> 
>>>> cat generic.ksh
>>>> #!/bin/ksh
>>>> #--------------------------------------------------------------------------------
>>>> #
>>>> # Procedure:    generic.ksh
>>>> #
>>>> # Description:  Compiles and run scala app usinbg sbt and spark-submit
>>>> #
>>>> # Parameters:   none
>>>> #
>>>> #--------------------------------------------------------------------------------
>>>> # Vers|  Date  | Who | DA | Description
>>>> #-----+--------+-----+----+-----------------------------------------------------
>>>> # 1.0 |04/03/15|  MT |    | Initial Version
>>>> #--------------------------------------------------------------------------------
>>>> #
>>>> function F_USAGE
>>>> {
>>>>    echo "USAGE: ${1##*/} -A '<Application>'"
>>>>    echo "USAGE: ${1##*/} -H '<HELP>' -h '<HELP>'"
>>>>    exit 10
>>>> }
>>>> #
>>>> # Main Section
>>>> #
>>>> if [[ "${1}" = "-h" || "${1}" = "-H" ]]; then
>>>>    F_USAGE $0
>>>> fi
>>>> ## MAP INPUT TO VARIABLES
>>>> while getopts A: opt
>>>> do
>>>>    case $opt in
>>>>    (A) APPLICATION="$OPTARG" ;;
>>>>    (*) F_USAGE $0 ;;
>>>>    esac
>>>> done
>>>> [[ -z ${APPLICATION} ]] && print "You must specify an application
value " && F_USAGE $0
>>>> ENVFILE=/home/hduser/dba/bin/environment.ksh
>>>> if [[ -f $ENVFILE ]]
>>>> then
>>>>         . $ENVFILE
>>>>         . ~/spark_1.5.2_bin-hadoop2.6.kshrc
>>>> else
>>>>         echo "Abort: $0 failed. No environment file ( $ENVFILE ) found"
>>>>         exit 1
>>>> fi
>>>> ##FILE_NAME=`basename $0 .ksh`
>>>> FILE_NAME=${APPLICATION}
>>>> CLASS=`echo ${FILE_NAME}|tr "[:upper:]" "[:lower:]"`
>>>> NOW="`date +%Y%m%d_%H%M`"
>>>> LOG_FILE=${LOGDIR}/${FILE_NAME}.log
>>>> [ -f ${LOG_FILE} ] && rm -f ${LOG_FILE}
>>>> print "\n" `date` ", Started $0" | tee -a ${LOG_FILE}
>>>> cd ../${FILE_NAME}
>>>> print "Compiling ${FILE_NAME}" | tee -a ${LOG_FILE}
>>>> sbt package
>>>> print "Submiiting the job" | tee -a ${LOG_FILE}
>>>> 
>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>                 --packages com.databricks:spark-csv_2.11:1.3.0 \
>>>>                 --class "${FILE_NAME}" \
>>>>                 --master spark://50.140.197.217:7077 <http://50.140.197.217:7077/>
\
>>>>                 --executor-memory=12G \
>>>>                 --executor-cores=12 \
>>>>                 --num-executors=2 \
>>>>                 target/scala-2.10/${CLASS}_2.10-1.0.jar
>>>> print `date` ", Finished $0" | tee -a ${LOG_FILE}
>>>> exit
>>>> 
>>>> 
>>>> So to run it for ImportCSV all I need is to do
>>>> 
>>>> ./generic.ksh -A ImportCSV
>>>> 
>>>> Now can anyone kindly give me a rough guideline on directory and location
of pom.xml to make this work using maven?
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> Dr Mich Talebzadeh
>>>>  
>>>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>>  
>>>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>>>  
>>>> 
>>>> On 15 March 2016 at 10:50, Sean Owen <sowen@cloudera.com <mailto:sowen@cloudera.com>>
wrote:
>>>> FWIW, I strongly prefer Maven over SBT even for Scala projects. The
>>>> Spark build of reference is Maven.
>>>> 
>>>> On Tue, Mar 15, 2016 at 10:45 AM, Chandeep Singh <cs@chandeep.com <mailto:cs@chandeep.com>>
wrote:
>>>> > For Scala, SBT is recommended.
>>>> >
>>>> > On Mar 15, 2016, at 10:42 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com
<mailto:mich.talebzadeh@gmail.com>>
>>>> > wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > I build my Spark/Scala packages using SBT that works fine. I have created
>>>> > generic shell scripts to build and submit it.
>>>> >
>>>> > Yesterday I noticed that some use Maven and Pom for this purpose.
>>>> >
>>>> > Which approach is recommended?
>>>> >
>>>> > Thanks,
>>>> >
>>>> >
>>>> > Dr Mich Talebzadeh
>>>> >
>>>> >
>>>> >
>>>> > LinkedIn
>>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>> >
>>>> >
>>>> >
>>>> > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>>> >
>>>> >
>>>> >
>>>> >
>>>> 
>>> 
>>> 
>> 
> 
> 


Mime
View raw message