spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Nguyen <...@adatao.com>
Subject Re: Test coverage of Spark
Date Sat, 12 Oct 2013 21:44:20 GMT
Perfect. This is a great start of what I'm looking for.

--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Sat, Oct 12, 2013 at 2:31 PM, Mark Hamstra <mark@clearstorydata.com>wrote:

> There is also spark-perf <https://github.com/amplab/spark-perf>.
>
>
> On Sat, Oct 12, 2013 at 2:22 PM, Christopher Nguyen <ctn@adatao.com>
> wrote:
>
> > Roman, an area I think would (a) have high impact, and (b) is relatively
> > not well covered is performance analysis. I'm sure most teams are doing
> > this internally at their respective companies, but there is no shared
> code
> > base and shared wisdom about what we're finding/improving.
> >
> > For example, consider the task of loading a table from disk into memory
> by
> > Shark. We're getting conflicting data about how much of this is cpu-bound
> > vs I/O-bound. Our effort to track this down should be sharable somehow,
> and
> > would benefit from others' findings. Of course this is dependent on the
> > particular configuration, but there is a lot of test harness code/scripts
> > that can be shared. And individual findings, even if/especially if they
> are
> > conflicting, are very valuable if well documented.
> >
> > There is a Benchmark effort covered here
> > https://amplab.cs.berkeley.edu/benchmark/, but it addresses a slightly
> > different goal. You could consider this Perf-Analysis as part of that, or
> > as its own effort.
> >
> > This may be more than you were looking to own, but given your stated
> > enthusiasm :) I want to throw the idea out there.
> >
> > --
> > Christopher T. Nguyen
> > Co-founder & CEO, Adatao <http://adatao.com>
> > linkedin.com/in/ctnguyen
> >
> >
> >
> > On Sat, Oct 12, 2013 at 1:48 PM, Роман Ткаленко <tkalenkoroman@gmail.com
> > >wrote:
> >
> > > Hello.
> > > I'm trying to dive into Spark's sources on a deeper-than-mere-glance
> > level
> > > and I find beginning with writing unit tests a good way to do it. So,
> > > basically, I'm wondering if there are points to which I could
> > specifically
> > > apply my enthusiasm, i. e. are there some un- or not enough covered
> parts
> > > for which I could write some tests?
> > > I'm wondering as well about the state of Apache-hosted JIRA for Spark
> - I
> > > currently can't see any entry in there. Should I look for them in
> Github
> > > mirror or still in the antecedent JIRA instance on
> > > http://spark-project.atlassian.net/?
> > > Regards,
> > > Roman.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message