spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Re: Some praise and comments on Spark
Date Wed, 25 Feb 2015 22:57:57 GMT
Thanks for sharing the feedback about what works well for you!

It's nice to get that; as we all probably know, people generally reach out
only when they have problems.

On Wed, Feb 25, 2015 at 5:38 PM Reynold Xin <rxin@databricks.com> wrote:

> Thanks for the email and encouragement, Devl. Responses to the 3 requests:
>
> -tonnes of configuration properties and "go faster" type flags. For example
> Hadoop and Hbase users will know that there are a whole catalogue of
> properties for regions, caches, network properties, block sizes, etc etc.
> Please don't end up here for example:
> https://hadoop.apache.org/docs/r1.0.4/mapred-default.html, it is painful
> having to configure all of this and then create a set of properties for
> each environment and then tie this into CI and deployment tools.
>
> As the project grows, it is unavoidable to introduce more config options,
> in particular, we often use config options to test new modules that are
> still experimental before making them the default (e.g. sort-based
> shuffle).
>
> The philosophy here is to make it a very high bar to introduce new config
> options, and make the default values sensible for most deployments, and
> then whenever possible, figure out automatically what is the right setting.
> Note that this in general is hard, but we expect for 99% of the users they
> only need to know a very small number of options (e.g. setting the
> serializer).
>
>
> -no more daemons and processes to have to monitor and manipulate and
> restart and crash.
>
> At the very least you'd need the cluster manager itself to be a daemon
> process because we can't defy the law of physics. But I don't think we want
> to introduce anything beyond that.
>
>
> -a project that penalises developers (that will ultimately help promote
> Spark to their managers and budget holders) with expensive training,
> certification, books and accreditation. Ideally this open source should be
> free, free training= more users = more commercial uptake.
>
> I definitely agree with you on making it easier to learn Spark. We are
> making a lot of materials freely available, including two freely available
> MOOCs on edX:
> https://databricks.com/blog/2014/12/02/announcing-two-
> spark-based-moocs.html
>
>
>
> On Wed, Feb 25, 2015 at 2:13 PM, Devl Devel <devl.development@gmail.com>
> wrote:
>
> > Hi Spark Developers,
> >
> > First, apologies if this doesn't belong on this list but the
> > comments/praise are relevant to all developers. This is just a small note
> > about what we really like about Spark, I/we don't mean to start a whole
> > long discussion thread in this forum but just share our positive
> > experiences with Spark thus far.
> >
> > To start, as you can tell, we think that the Spark project is amazing and
> > we love it! Having put in nearly half a decade worth of sweat and tears
> > into production Hadoop, MapReduce clusters and application development
> it's
> > so refreshing to see something arguably simpler and more elegant to
> > supersede it.
> >
> > These are the things we love about Spark and hope these principles
> > continue:
> >
> > -the one command build; make-distribution.sh. Simple, clean  and ideal
> for
> > deployment and devops and rebuilding on different environments and nodes.
> > -not having too much runtime and deploy config; as admins and developers
> we
> > are sick of setting props like io.sort and mapred.job.shuffle.merge.
> percent
> > and dfs file locations and temp directories and so on and on again and
> > again every time we deploy a job, new cluster, environment or even change
> > company.
> > -a fully built-in stack, one global project for SQL, dataframes, MLlib
> etc,
> > so there is no need to add on projects to it on as per Hive, Hue, Hbase
> > etc. This helps life and keeps everything in one place.
> > -single (global) user based operation - no creation of a hdfs mapred unix
> > user, makes life much simpler
> > -single quick-start daemons; master and slaves. Not having to worry about
> > JT, NN, DN , TT, RM, Hbase master ... and doing netstat and jps on
> hundreds
> > of clusters makes life much easier.
> > -proper code versioning, feature releases and release management.
> > - good & well organised documentation with good examples.
> >
> > In addition to the comments above this is where we hope Spark never ends
> > up:
> >
> > -tonnes of configuration properties and "go faster" type flags. For
> example
> > Hadoop and Hbase users will know that there are a whole catalogue of
> > properties for regions, caches, network properties, block sizes, etc etc.
> > Please don't end up here for example:
> > https://hadoop.apache.org/docs/r1.0.4/mapred-default.html, it is painful
> > having to configure all of this and then create a set of properties for
> > each environment and then tie this into CI and deployment tools.
> > -no more daemons and processes to have to monitor and manipulate and
> > restart and crash.
> > -a project that penalises developers (that will ultimately help promote
> > Spark to their managers and budget holders) with expensive training,
> > certification, books and accreditation. Ideally this open source should
> be
> > free, free training= more users = more commercial uptake.
> >
> > Anyway, those are our thoughts for what they are worth, keep up the good
> > work, we just had to mention it. Again sorry if this is not the right
> place
> > or if there is another forum for this stuff.
> >
> > Cheers
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message