spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iulian Dragoș <iulian.dra...@typesafe.com>
Subject Re: Speeding up Spark build during development
Date Tue, 05 May 2015 09:36:05 GMT
I'm probably the only Eclipse user here, but it seems I have the best
workflow :) At least for me things work as they should: once I imported
projects in the workspace I can build and run/debug tests from the IDE. I
only go to sbt when I need to re-create projects or I want to run the full
test suite.


iulian



On Tue, May 5, 2015 at 7:35 AM, Tathagata Das <tdas@databricks.com> wrote:

> In addition to Michael suggestion, in my SBT workflow I also use "~" to
> automatically kickoff build and unit test. For example,
>
> sbt/sbt "~streaming/test-only *BasicOperationsSuite*"
>
> It will automatically detect any file changes in the project and start of
> the compilation and testing.
> So my full workflow involves changing code in IntelliJ and then
> continuously running unit tests in the background on the command line using
> this "~".
>
> TD
>
>
> On Mon, May 4, 2015 at 2:49 PM, Michael Armbrust <michael@databricks.com>
> wrote:
>
> > FWIW... My Spark SQL development workflow is usually to run "build/sbt
> > sparkShell" or "build/sbt 'sql/test-only <testSuiteName>'".  These
> commands
> > starts in as little as 30s on my laptop, automatically figure out which
> > subprojects need to be rebuilt, and don't require the expensive assembly
> > creation.
> >
> > On Mon, May 4, 2015 at 5:48 AM, Meethu Mathew <meethu.mathew@flytxt.com>
> > wrote:
> >
> > > *
> > > *
> > > ** ** ** ** ** **** ** **** Hi,
> > >
> > >  Is it really necessary to run **mvn --projects assembly/ -DskipTests
> > > install ? Could you please explain why this is needed?
> > > I got the changes after running "mvn --projects streaming/ -DskipTests
> > > package".
> > >
> > > Regards,
> > > Meethu
> > >
> > >
> > > On Monday 04 May 2015 02:20 PM, Emre Sevinc wrote:
> > >
> > >> Just to give you an example:
> > >>
> > >> When I was trying to make a small change only to the Streaming
> component
> > >> of
> > >> Spark, first I built and installed the whole Spark project (this took
> > >> about
> > >> 15 minutes on my 4-core, 4 GB RAM laptop). Then, after having changed
> > >> files
> > >> only in Streaming, I ran something like (in the top-level directory):
> > >>
> > >>     mvn --projects streaming/ -DskipTests package
> > >>
> > >> and then
> > >>
> > >>     mvn --projects assembly/ -DskipTests install
> > >>
> > >>
> > >> This was much faster than trying to build the whole Spark from
> scratch,
> > >> because Maven was only building one component, in my case the
> Streaming
> > >> component, of Spark. I think you can use a very similar approach.
> > >>
> > >> --
> > >> Emre Sevinç
> > >>
> > >>
> > >>
> > >> On Mon, May 4, 2015 at 10:44 AM, Pramod Biligiri <
> > >> pramodbiligiri@gmail.com>
> > >> wrote:
> > >>
> > >>  No, I just need to build one project at a time. Right now SparkSql.
> > >>>
> > >>> Pramod
> > >>>
> > >>> On Mon, May 4, 2015 at 12:09 AM, Emre Sevinc <emre.sevinc@gmail.com>
> > >>> wrote:
> > >>>
> > >>>  Hello Pramod,
> > >>>>
> > >>>> Do you need to build the whole project every time? Generally you
> > don't,
> > >>>> e.g., when I was changing some files that belong only to Spark
> > >>>> Streaming, I
> > >>>> was building only the streaming (of course after having build and
> > >>>> installed
> > >>>> the whole project, but that was done only once), and then the
> > assembly.
> > >>>> This was much faster than trying to build the whole Spark every
> time.
> > >>>>
> > >>>> --
> > >>>> Emre Sevinç
> > >>>>
> > >>>> On Mon, May 4, 2015 at 9:01 AM, Pramod Biligiri <
> > >>>> pramodbiligiri@gmail.com
> > >>>>
> > >>>>> wrote:
> > >>>>> Using the inbuilt maven and zinc it takes around 10 minutes
for
> each
> > >>>>> build.
> > >>>>> Is that reasonable?
> > >>>>> My maven opts looks like this:
> > >>>>> $ echo $MAVEN_OPTS
> > >>>>> -Xmx12000m -XX:MaxPermSize=2048m
> > >>>>>
> > >>>>> I'm running it as build/mvn -DskipTests package
> > >>>>>
> > >>>>> Should I be tweaking my Zinc/Nailgun config?
> > >>>>>
> > >>>>> Pramod
> > >>>>>
> > >>>>> On Sun, May 3, 2015 at 3:40 PM, Mark Hamstra <
> > mark@clearstorydata.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>
> >
> https://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn
> > >>>>>
> > >>>>>> On Sun, May 3, 2015 at 2:54 PM, Pramod Biligiri <
> > >>>>>>
> > >>>>> pramodbiligiri@gmail.com>
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>  This is great. I didn't know about the mvn script in the
build
> > >>>>>>>
> > >>>>>> directory.
> > >>>>>
> > >>>>>> Pramod
> > >>>>>>>
> > >>>>>>> On Fri, May 1, 2015 at 9:51 AM, York, Brennon <
> > >>>>>>> Brennon.York@capitalone.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>  Following what Ted said, if you leverage the `mvn`
from within
> the
> > >>>>>>>> `build/` directory of Spark you¹ll get zinc for
free which
> should
> > >>>>>>>>
> > >>>>>>> help
> > >>>>>
> > >>>>>> speed up build times.
> > >>>>>>>>
> > >>>>>>>> On 5/1/15, 9:45 AM, "Ted Yu" <yuzhihong@gmail.com>
wrote:
> > >>>>>>>>
> > >>>>>>>>  Pramod:
> > >>>>>>>>> Please remember to run Zinc so that the build
is faster.
> > >>>>>>>>>
> > >>>>>>>>> Cheers
> > >>>>>>>>>
> > >>>>>>>>> On Fri, May 1, 2015 at 9:36 AM, Ulanov, Alexander
> > >>>>>>>>> <alexander.ulanov@hp.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>  Hi Pramod,
> > >>>>>>>>>>
> > >>>>>>>>>> For cluster-like tests you might want to
use the same code as
> in
> > >>>>>>>>>>
> > >>>>>>>>> mllib's
> > >>>>>>>
> > >>>>>>>> LocalClusterSparkContext. You can rebuild only
the package that
> > >>>>>>>>>>
> > >>>>>>>>> you
> > >>>>>
> > >>>>>> change
> > >>>>>>>>>> and then run this main class.
> > >>>>>>>>>>
> > >>>>>>>>>> Best regards, Alexander
> > >>>>>>>>>>
> > >>>>>>>>>> -----Original Message-----
> > >>>>>>>>>> From: Pramod Biligiri [mailto:pramodbiligiri@gmail.com]
> > >>>>>>>>>> Sent: Friday, May 01, 2015 1:46 AM
> > >>>>>>>>>> To: dev@spark.apache.org
> > >>>>>>>>>> Subject: Speeding up Spark build during
development
> > >>>>>>>>>>
> > >>>>>>>>>> Hi,
> > >>>>>>>>>> I'm making some small changes to the Spark
codebase and trying
> > >>>>>>>>>>
> > >>>>>>>>> it out
> > >>>>>
> > >>>>>> on a
> > >>>>>>>>>> cluster. I was wondering if there's a faster
way to build than
> > >>>>>>>>>>
> > >>>>>>>>> running
> > >>>>>>>
> > >>>>>>>> the
> > >>>>>>>>>> package target each time.
> > >>>>>>>>>> Currently I'm using: mvn -DskipTests  package
> > >>>>>>>>>>
> > >>>>>>>>>> All the nodes have the same filesystem
mounted at the same
> mount
> > >>>>>>>>>>
> > >>>>>>>>> point.
> > >>>>>>>
> > >>>>>>>> Pramod
> > >>>>>>>>>>
> > >>>>>>>>>>  ________________________________________________________
> > >>>>>>>>
> > >>>>>>>> The information contained in this e-mail is confidential
and/or
> > >>>>>>>> proprietary to Capital One and/or its affiliates.
The
> information
> > >>>>>>>> transmitted herewith is intended only for use by
the individual
> or
> > >>>>>>>>
> > >>>>>>> entity
> > >>>>>>>
> > >>>>>>>> to which it is addressed.  If the reader of this
message is not
> > the
> > >>>>>>>> intended recipient, you are hereby notified that
any review,
> > >>>>>>>> retransmission, dissemination, distribution, copying
or other
> use
> > >>>>>>>>
> > >>>>>>> of, or
> > >>>>>
> > >>>>>> taking of any action in reliance upon this information
is strictly
> > >>>>>>>> prohibited. If you have received this communication
in error,
> > please
> > >>>>>>>> contact the sender and delete the material from
your computer.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>>> --
> > >>>> Emre Sevinc
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>



-- 

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message