spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Rosen <rosenvi...@gmail.com>
Subject Re: Unit tests in < 5 minutes
Date Fri, 08 Aug 2014 19:00:20 GMT
One simple optimization might be to disable the application web UI in tests that don’t need
it.  When running tests on my local machine while also running another Spark shell, I’ve
noticed that the test logs fill up with errors when the web UI attempts to bind to the default
port, fails, and tries a higher one.

- Josh
On August 8, 2014 at 11:54:24 AM, Patrick Wendell (pwendell@gmail.com) wrote:

I dug around this a bit a while ago, I think if someone sat down and  
profiled the tests it's likely we could find some things to optimize.  
In particular, there may be overheads in starting up a local spark  
context that could be minimized and speed up all the tests. Also,  
there are some tests (especially in Streaming) that take really long,  
like 60 seconds for a single test (see some of the new flume tests).  
These could almost certainly be optimized.  

I think 5 minutes might be out of reach, but something like a 2X  
improvement might be possible and would be very valuable if  
accomplished.  

- Patrick  

On Fri, Aug 8, 2014 at 11:24 AM, Matei Zaharia <matei.zaharia@gmail.com> wrote:  
> Just as a note, when you're developing stuff, you can use "test-only" in sbt, or the
equivalent feature in Maven, to run just some of the tests. This is what I do, I don't wait
for Jenkins to run things. 90% of the time if it passes the tests that I know could break
stuff, it will pass all of Jenkins.  
>  
> Jenkins should always be doing all the integration tests, so I don't think it will become
*that* much shorter in the long run, though it can certainly be improved.  
>  
> Matei  
>  
> On August 8, 2014 at 10:20:35 AM, Nicolas Liochon (nkeywal@gmail.com) wrote:  
>  
> fwiw, when we did this work in HBase, we categorized the tests. Then some  
> tests can share a single jvm, while some others need to be isolated in  
> their own jvm. Nevertheless surefire can still run them in parallel by  
> starting/stopping several jvm.  
>  
> Nicolas  
>  
>  
> On Fri, Aug 8, 2014 at 7:10 PM, Reynold Xin <rxin@databricks.com> wrote:  
>  
>> ScalaTest actually has support for parallelization built-in. We can use  
>> that.  
>>  
>> The main challenge is to make sure all the test suites can work in parallel  
>> when running along side each other.  
>>  
>>  
>> On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu <yuzhihong@gmail.com> wrote:  
>>  
>> > How about using parallel execution feature of maven-surefire-plugin  
>> > (assuming all the tests were made parallel friendly) ?  
>> >  
>> >  
>> >  
>> http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
 
>> >  
>> > Cheers  
>> >  
>> >  
>> > On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen <sowen@cloudera.com> wrote:
 
>> >  
>> > > A common approach is to separate unit tests from integration tests.  
>> > > Maven has support for this distinction. I'm not sure it helps a lot  
>> > > though, since it only helps you to not run integration tests all the  
>> > > time. But lots of Spark tests are integration-test-like and are  
>> > > important to run to know a change works.  
>> > >  
>> > > I haven't heard of a plugin to run different test suites remotely on  
>> > > many machines, but I would not be surprised if it exists.  
>> > >  
>> > > The Jenkins servers aren't CPU-bound as far as I can tell. It's that  
>> > > the tests spend a lot of time waiting for bits to start up or  
>> > > complete. That implies the existing tests could be sped up by just  
>> > > running in parallel locally. I recall someone recently proposed this? 

>> > >  
>> > > And I think the problem with that is simply that some of the tests  
>> > > collide with each other, by opening up the same port at the same time 

>> > > for example. I know that kind of problem is being attacked even right 

>> > > now. But if all the tests were made parallel friendly, I imagine  
>> > > parallelism could be enabled and speed up builds greatly without any  
>> > > remote machines.  
>> > >  
>> > >  
>> > > On Fri, Aug 8, 2014 at 5:01 PM, Nicholas Chammas  
>> > > <nicholas.chammas@gmail.com> wrote:  
>> > > > Howdy,  
>> > > >  
>> > > > Do we think it's both feasible and worthwhile to invest in getting
 
>> our  
>> > > unit  
>> > > > tests to finish in under 5 minutes (or something similarly brief)
 
>> when  
>> > > run  
>> > > > by Jenkins?  
>> > > >  
>> > > > Unit tests currently seem to take anywhere from 30 min to 2 hours.
As  
>> > > > people add more tests, I imagine this time will only grow. I think
it  
>> > > would  
>> > > > be better for both contributors and reviewers if they didn't have
to  
>> > wait  
>> > > > so long for test results; PR reviews would be shorter, if nothing
 
>> else.  
>> > > >  
>> > > > I don't know how how this is normally done, but maybe it wouldn't
be  
>> > too  
>> > > > much work to get a test cycle to feel lighter.  
>> > > >  
>> > > > Most unit tests are independent and can be run concurrently, right?
 
>> > Would  
>> > > > it make sense to build a given patch on many servers at once and send
 
>> > > > disjoint sets of unit tests to each?  
>> > > >  
>> > > > I'd be interested in working on something like that if possible (and
 
>> > > > sensible).  
>> > > >  
>> > > > Nick  
>> > >  
>> > > --------------------------------------------------------------------- 

>> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
>> > > For additional commands, e-mail: dev-help@spark.apache.org  
>> > >  
>> > >  
>> >  
>>  

---------------------------------------------------------------------  
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org  
For additional commands, e-mail: dev-help@spark.apache.org  


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message