spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Spark Improvement Proposals
Date Thu, 16 Feb 2017 16:22:12 GMT
Updated. Any feedback from other community members?


On Wed, Feb 15, 2017 at 2:53 AM, Cody Koeninger <cody@koeninger.org> wrote:

> Thanks for doing that.
>
> Given that there are at least 4 different Apache voting processes,
> "typical Apache vote process" isn't meaningful to me.
>
> I think the intention is that in order to pass, it needs at least 3 +1
> votes from PMC members *and no -1 votes from PMC members*.  But the
> document doesn't explicitly say that second part.
>
> There's also no mention of the duration a vote should remain open.
> There's a mention of a month for finding a shepherd, but that's different.
>
> Other than that, LGTM.
>
> On Mon, Feb 13, 2017 at 9:02 AM, Reynold Xin <rxin@databricks.com> wrote:
>
>> Here's a new draft that incorporated most of the feedback:
>> https://docs.google.com/document/d/1-Zdi_W-wtuxS9h
>> TK0P9qb2x-nRanvXmnZ7SUi4qMljg/edit#
>>
>> I added a specific role for SPIP Author and another one for SPIP Shepherd.
>>
>> On Sat, Feb 11, 2017 at 6:13 PM, Xiao Li <gatorsmile@gmail.com> wrote:
>>
>>> During the summit, I also had a lot of discussions over similar topics
>>> with multiple Committers and active users. I heard many fantastic ideas. I
>>> believe Spark improvement proposals are good channels to collect the
>>> requirements/designs.
>>>
>>>
>>> IMO, we also need to consider the priority when working on these items.
>>> Even if the proposal is accepted, it does not mean it will be implemented
>>> and merged immediately. It is not a FIFO queue.
>>>
>>>
>>> Even if some PRs are merged, sometimes, we still have to revert them
>>> back, if the design and implementation are not reviewed carefully. We have
>>> to ensure our quality. Spark is not an application software. It is an
>>> infrastructure software that is being used by many many companies. We have
>>> to be very careful in the design and implementation, especially
>>> adding/changing the external APIs.
>>>
>>>
>>> When I developed the Mainframe infrastructure/middleware software in the
>>> past 6 years, I were involved in the discussions with external/internal
>>> customers. The to-do feature list was always above 100. Sometimes, the
>>> customers are feeling frustrated when we are unable to deliver them on time
>>> due to the resource limits and others. Even if they paid us billions, we
>>> still need to do it phase by phase or sometimes they have to accept the
>>> workarounds. That is the reality everyone has to face, I think.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Xiao Li
>>>
>>> 2017-02-11 7:57 GMT-08:00 Cody Koeninger <cody@koeninger.org>:
>>>
>>>> At the spark summit this week, everyone from PMC members to users I had
>>>> never met before were asking me about the Spark improvement proposals
>>>> idea.  It's clear that it's a real community need.
>>>>
>>>> But it's been almost half a year, and nothing visible has been done.
>>>>
>>>> Reynold, are you going to do this?
>>>>
>>>> If so, when?
>>>>
>>>> If not, why?
>>>>
>>>> You already did the right thing by including long-deserved committers.
>>>> Please keep doing the right thing for the community.
>>>>
>>>> On Wed, Jan 11, 2017 at 4:13 AM, Reynold Xin <rxin@databricks.com>
>>>> wrote:
>>>>
>>>>> +1 on all counts (consensus, time bound, define roles)
>>>>>
>>>>> I can update the doc in the next few days and share back. Then maybe
>>>>> we can just officially vote on this. As Tim suggested, we might not get it
>>>>> 100% right the first time and would need to re-iterate. But that's fine.
>>>>>
>>>>>
>>>>> On Thu, Jan 5, 2017 at 3:29 PM, Tim Hunter <timhunter@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Cody,
>>>>>> thank you for bringing up this topic, I agree it is very important to
>>>>>> keep a cohesive community around some common, fluid goals. Here are a few
>>>>>> comments about the current document:
>>>>>>
>>>>>> 1. name: it should not overlap with an existing one such as SIP. Can
>>>>>> you imagine someone trying to discuss a scala spore proposal for spark?
>>>>>> "[Spark] SIP-3 is intended to evolve in tandem with [Scala] SIP-21". SPIP
>>>>>> sounds great.
>>>>>>
>>>>>> 2. roles: at a high level, SPIPs are meant to reach consensus for
>>>>>> technical decisions with a lasting impact. As such, the template should
>>>>>> emphasize the role of the various parties during this process:
>>>>>>
>>>>>>  - the SPIP author is responsible for building consensus. She is the
>>>>>> champion driving the process forward and is responsible for ensuring that
>>>>>> the SPIP follows the general guidelines. The author should be identified in
>>>>>> the SPIP. The authorship of a SPIP can be transferred if the current author
>>>>>> is not interested and someone else wants to move the SPIP forward. There
>>>>>> should probably be 2-3 authors at most for each SPIP.
>>>>>>
>>>>>>  - someone with voting power should probably shepherd the SPIP (and
>>>>>> be recorded as such): ensuring that the final decision over the SPIP is
>>>>>> recorded (rejected, accepted, etc.), and advising about the technical
>>>>>> quality of the SPIP: this person need not be a champion for the SPIP or
>>>>>> contribute to it, but rather makes sure it stands a chance of being
>>>>>> approved when the vote happens. Also, if the author cannot find anyone who
>>>>>> would want to take this role, this proposal is likely to be rejected anyway.
>>>>>>
>>>>>>  - users, committers, contributors have the roles already outlined in
>>>>>> the document
>>>>>>
>>>>>> 3. timeline: ideally, once a SPIP has been offered for voting, it
>>>>>> should move swiftly into either being accepted or rejected, so that we do
>>>>>> not end up with a distracting long tail of half-hearted proposals.
>>>>>>
>>>>>> These rules are meant to be flexible, but the current document should
>>>>>> be clear about who is in charge of a SPIP, and the state it is currently in.
>>>>>>
>>>>>> We have had long discussions over some very important questions such
>>>>>> as approval. I do not have an opinion on these, but why not make a pick and
>>>>>> reevaluate this decision later? This is not a binding process at this point.
>>>>>>
>>>>>> Tim
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 3, 2017 at 3:16 PM, Cody Koeninger <cody@koeninger.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I don't have a concern about voting vs consensus.
>>>>>>>
>>>>>>> I have a concern that whatever the decision making process is, it is
>>>>>>> explicitly announced on the ticket for the given proposal, with an explicit
>>>>>>> deadline, and an explicit outcome.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 3, 2017 at 4:08 PM, Imran Rashid <irashid@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I'm also in favor of this.  Thanks for your persistence Cody.
>>>>>>>>
>>>>>>>> My take on the specific issues Joseph mentioned:
>>>>>>>>
>>>>>>>> 1) voting vs. consensus -- I agree with the argument Ryan Blue made
>>>>>>>> earlier for consensus:
>>>>>>>>
>>>>>>>> > Majority vs consensus: My rationale is that I don't think we want
>>>>>>>> to consider a proposal approved if it had objections serious enough that
>>>>>>>> committers down-voted (or PMC depending on who gets a vote). If these
>>>>>>>> proposals are like PEPs, then they represent a significant amount of
>>>>>>>> community effort and I wouldn't want to move forward if up to half of the
>>>>>>>> community thinks it's an untenable idea.
>>>>>>>>
>>>>>>>> 2) Design doc template -- agree this would be useful, but also
>>>>>>>> seems totally orthogonal to moving forward on the SIP proposal.
>>>>>>>>
>>>>>>>> 3) agree w/ Joseph's proposal for updating the template.
>>>>>>>>
>>>>>>>> One small addition:
>>>>>>>>
>>>>>>>> 4) Deciding on a name -- minor, but I think its wroth
>>>>>>>> disambiguating from Scala's SIPs, and the best proposal I've heard is
>>>>>>>> "SPIP".   At least, no one has objected.  (don't care enough that I'd
>>>>>>>> object to anything else, though.)
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 3, 2017 at 3:30 PM, Joseph Bradley <
>>>>>>>> joseph@databricks.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Cody,
>>>>>>>>>
>>>>>>>>> Thanks for being persistent about this.  I too would like to see
>>>>>>>>> this happen.  Reviewing the thread, it sounds like the main things
>>>>>>>>> remaining are:
>>>>>>>>> * Decide about a few issues
>>>>>>>>> * Finalize the doc(s)
>>>>>>>>> * Vote on this proposal
>>>>>>>>>
>>>>>>>>> Issues & TODOs:
>>>>>>>>>
>>>>>>>>> (1) The main issue I see above is voting vs. consensus.  I have
>>>>>>>>> little preference here.  It sounds like something which could be tailored
>>>>>>>>> based on whether we see too many or too few SIPs being approved.
>>>>>>>>>
>>>>>>>>> (2) Design doc template  (This would be great to have for Spark
>>>>>>>>> regardless of this SIP discussion.)
>>>>>>>>> * Reynold, are you still putting this together?
>>>>>>>>>
>>>>>>>>> (3) Template cleanups.  Listing some items mentioned above + a new
>>>>>>>>> one w.r.t. Reynold's draft
>>>>>>>>> <https://docs.google.com/document/d/1-Zdi_W-wtuxS9hTK0P9qb2x-nRanvXmnZ7SUi4qMljg/edit#>
>>>>>>>>> :
>>>>>>>>> * Reinstate the "Where" section with links to current and past SIPs
>>>>>>>>> * Add field for stating explicit deadlines for approval
>>>>>>>>> * Add field for stating Author & Committer shepherd
>>>>>>>>>
>>>>>>>>> Thanks all!
>>>>>>>>> Joseph
>>>>>>>>>
>>>>>>>>> On Mon, Jan 2, 2017 at 7:45 AM, Cody Koeninger <cody@koeninger.org
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> I'm bumping this one more time for the new year, and then I'm
>>>>>>>>>> giving up.
>>>>>>>>>>
>>>>>>>>>> Please, fix your process, even if it isn't exactly the way I
>>>>>>>>>> suggested.
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 8, 2016 at 11:14 AM, Ryan Blue <rblue@netflix.com>
>>>>>>>>>> wrote:
>>>>>>>>>> > On lazy consensus as opposed to voting:
>>>>>>>>>> >
>>>>>>>>>> > First, why lazy consensus? The proposal was for consensus,
>>>>>>>>>> which is at least
>>>>>>>>>> > three +1 votes and no vetos. Consensus has no losing side, it
>>>>>>>>>> requires
>>>>>>>>>> > getting to a point where there is agreement. Isn't that
>>>>>>>>>> agreement what we
>>>>>>>>>> > want to achieve with these proposals?
>>>>>>>>>> >
>>>>>>>>>> > Second, lazy consensus only removes the requirement for three
>>>>>>>>>> +1 votes. Why
>>>>>>>>>> > would we not want at least three committers to think something
>>>>>>>>>> is a good
>>>>>>>>>> > idea before adopting the proposal?
>>>>>>>>>> >
>>>>>>>>>> > rb
>>>>>>>>>> >
>>>>>>>>>> > On Tue, Nov 8, 2016 at 8:13 AM, Cody Koeninger <
>>>>>>>>>> cody@koeninger.org> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> So there are some minor things (the Where section heading
>>>>>>>>>> appears to
>>>>>>>>>> >> be dropped; wherever this document is posted it needs to
>>>>>>>>>> actually link
>>>>>>>>>> >> to a jira filter showing current / past SIPs) but it doesn't
>>>>>>>>>> look like
>>>>>>>>>> >> I can comment on the google doc.
>>>>>>>>>> >>
>>>>>>>>>> >> The major substantive issue that I have is that this version is
>>>>>>>>>> >> significantly less clear as to the outcome of an SIP.
>>>>>>>>>> >>
>>>>>>>>>> >> The apache example of lazy consensus at
>>>>>>>>>> >> http://apache.org/foundation/voting.html#LazyConsensus
>>>>>>>>>> involves an
>>>>>>>>>> >> explicit announcement of an explicit deadline, which I think
>>>>>>>>>> are
>>>>>>>>>> >> necessary for clarity.
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Mon, Nov 7, 2016 at 1:55 PM, Reynold Xin <
>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>> >> > It turned out suggested edits (trackable) don't show up for
>>>>>>>>>> non-owners,
>>>>>>>>>> >> > so
>>>>>>>>>> >> > I've just merged all the edits in place. It should be
>>>>>>>>>> visible now.
>>>>>>>>>> >> >
>>>>>>>>>> >> > On Mon, Nov 7, 2016 at 10:10 AM, Reynold Xin <
>>>>>>>>>> rxin@databricks.com>
>>>>>>>>>> >> > wrote:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Oops. Let me try figure that out.
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> On Monday, November 7, 2016, Cody Koeninger <
>>>>>>>>>> cody@koeninger.org> wrote:
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Thanks for picking up on this.
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Maybe I fail at google docs, but I can't see any edits on
>>>>>>>>>> the document
>>>>>>>>>> >> >>> you linked.
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> Regarding lazy consensus, if the board in general has less
>>>>>>>>>> of an issue
>>>>>>>>>> >> >>> with that, sure.  As long as it is clearly announced,
>>>>>>>>>> lasts at least
>>>>>>>>>> >> >>> 72 hours, and has a clear outcome.
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> The other points are hard to comment on without being able
>>>>>>>>>> to see the
>>>>>>>>>> >> >>> text in question.
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> On Mon, Nov 7, 2016 at 3:11 AM, Reynold Xin <
>>>>>>>>>> rxin@databricks.com>
>>>>>>>>>> >> >>> wrote:
>>>>>>>>>> >> >>> > I just looked through the entire thread again tonight -
>>>>>>>>>> there are a
>>>>>>>>>> >> >>> > lot
>>>>>>>>>> >> >>> > of
>>>>>>>>>> >> >>> > great ideas being discussed. Thanks Cody for taking the
>>>>>>>>>> first crack
>>>>>>>>>> >> >>> > at
>>>>>>>>>> >> >>> > the
>>>>>>>>>> >> >>> > proposal.
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > I want to first comment on the context. Spark is one of
>>>>>>>>>> the most
>>>>>>>>>> >> >>> > innovative
>>>>>>>>>> >> >>> > and important projects in (big) data -- overall
>>>>>>>>>> technical decisions
>>>>>>>>>> >> >>> > made in
>>>>>>>>>> >> >>> > Apache Spark are sound. But of course, a project as
>>>>>>>>>> large and active
>>>>>>>>>> >> >>> > as
>>>>>>>>>> >> >>> > Spark always have room for improvement, and we as a
>>>>>>>>>> community should
>>>>>>>>>> >> >>> > strive
>>>>>>>>>> >> >>> > to take it to the next level.
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > To that end, the two biggest areas for improvements in
>>>>>>>>>> my opinion
>>>>>>>>>> >> >>> > are:
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > 1. Visibility: There are so much happening that it is
>>>>>>>>>> difficult to
>>>>>>>>>> >> >>> > know
>>>>>>>>>> >> >>> > what
>>>>>>>>>> >> >>> > really is going on. For people that don't follow
>>>>>>>>>> closely, it is
>>>>>>>>>> >> >>> > difficult to
>>>>>>>>>> >> >>> > know what the important initiatives are. Even for people
>>>>>>>>>> that do
>>>>>>>>>> >> >>> > follow, it
>>>>>>>>>> >> >>> > is difficult to know what specific things require their
>>>>>>>>>> attention,
>>>>>>>>>> >> >>> > since the
>>>>>>>>>> >> >>> > number of pull requests and JIRA tickets are high and
>>>>>>>>>> it's difficult
>>>>>>>>>> >> >>> > to
>>>>>>>>>> >> >>> > extract signal from noise.
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > 2. Solicit user (broadly defined, including developers
>>>>>>>>>> themselves)
>>>>>>>>>> >> >>> > input
>>>>>>>>>> >> >>> > more proactively: At the end of the day the project
>>>>>>>>>> provides value
>>>>>>>>>> >> >>> > because
>>>>>>>>>> >> >>> > users use it. Users can't tell us exactly what to build,
>>>>>>>>>> but it is
>>>>>>>>>> >> >>> > important
>>>>>>>>>> >> >>> > to get their inputs.
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > I've taken Cody's doc and edited it:
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > https://docs.google.com/docume
>>>>>>>>>> nt/d/1-Zdi_W-wtuxS9hTK0P9qb2x-nRanvXmnZ7SUi4qMljg/edit#headi
>>>>>>>>>> ng=h.36ut37zh7w2b
>>>>>>>>>> >> >>> > (I've made all my modifications trackable)
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > There are couple high level changes I made:
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > 1. I've consulted a board member and he recommended lazy
>>>>>>>>>> consensus
>>>>>>>>>> >> >>> > as
>>>>>>>>>> >> >>> > opposed to voting. The reason being in voting there can
>>>>>>>>>> easily be a
>>>>>>>>>> >> >>> > "loser'
>>>>>>>>>> >> >>> > that gets outvoted.
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > 2. I made it lighter weight, and renamed "strategy" to
>>>>>>>>>> "optional
>>>>>>>>>> >> >>> > design
>>>>>>>>>> >> >>> > sketch". Echoing one of the earlier email: "IMHO so far
>>>>>>>>>> aside from
>>>>>>>>>> >> >>> > tagging
>>>>>>>>>> >> >>> > things and linking them elsewhere simply having design
>>>>>>>>>> docs and
>>>>>>>>>> >> >>> > prototypes
>>>>>>>>>> >> >>> > implementations in PRs is not something that has not
>>>>>>>>>> worked so far".
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > 3. I made some the language tweaks to focus more on
>>>>>>>>>> visibility. For
>>>>>>>>>> >> >>> > example,
>>>>>>>>>> >> >>> > "The purpose of an SIP is to inform and involve", rather
>>>>>>>>>> than just
>>>>>>>>>> >> >>> > "involve". SIPs should also have at least two emails
>>>>>>>>>> that go to
>>>>>>>>>> >> >>> > dev@.
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > While I was editing this, I thought we really needed a
>>>>>>>>>> suggested
>>>>>>>>>> >> >>> > template
>>>>>>>>>> >> >>> > for design doc too. I will get to that too ...
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> > On Tue, Nov 1, 2016 at 12:09 AM, Reynold Xin <
>>>>>>>>>> rxin@databricks.com>
>>>>>>>>>> >> >>> > wrote:
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> Most things looked OK to me too, although I do plan to
>>>>>>>>>> take a
>>>>>>>>>> >> >>> >> closer
>>>>>>>>>> >> >>> >> look
>>>>>>>>>> >> >>> >> after Nov 1st when we cut the release branch for 2.1.
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >> On Mon, Oct 31, 2016 at 3:12 PM, Marcelo Vanzin
>>>>>>>>>> >> >>> >> <vanzin@cloudera.com>
>>>>>>>>>> >> >>> >> wrote:
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> The proposal looks OK to me. I assume, even though
>>>>>>>>>> it's not
>>>>>>>>>> >> >>> >>> explicitly
>>>>>>>>>> >> >>> >>> called, that voting would happen by e-mail? A template
>>>>>>>>>> for the
>>>>>>>>>> >> >>> >>> proposal document (instead of just a bullet nice)
>>>>>>>>>> would also be
>>>>>>>>>> >> >>> >>> nice,
>>>>>>>>>> >> >>> >>> but that can be done at any time.
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> BTW, shameless plug: I filed SPARK-18085 which I
>>>>>>>>>> consider a
>>>>>>>>>> >> >>> >>> candidate
>>>>>>>>>> >> >>> >>> for a SIP, given the scope of the work. The document
>>>>>>>>>> attached even
>>>>>>>>>> >> >>> >>> somewhat matches the proposed format. So if anyone
>>>>>>>>>> wants to try
>>>>>>>>>> >> >>> >>> out
>>>>>>>>>> >> >>> >>> the process...
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>> On Mon, Oct 31, 2016 at 10:34 AM, Cody Koeninger
>>>>>>>>>> >> >>> >>> <cody@koeninger.org>
>>>>>>>>>> >> >>> >>> wrote:
>>>>>>>>>> >> >>> >>> > Now that spark summit europe is over, are any
>>>>>>>>>> committers
>>>>>>>>>> >> >>> >>> > interested
>>>>>>>>>> >> >>> >>> > in
>>>>>>>>>> >> >>> >>> > moving forward with this?
>>>>>>>>>> >> >>> >>> >
>>>>>>>>>> >> >>> >>> >
>>>>>>>>>> >> >>> >>> >
>>>>>>>>>> >> >>> >>> >
>>>>>>>>>> >> >>> >>> > https://github.com/koeninger/s
>>>>>>>>>> park-1/blob/SIP-0/docs/spark-improvement-proposals.md
>>>>>>>>>> >> >>> >>> >
>>>>>>>>>> >> >>> >>> > Or are we going to let this discussion die on the
>>>>>>>>>> vine?
>>>>>>>>>> >> >>> >>> >
>>>>>>>>>> >> >>> >>> > On Mon, Oct 17, 2016 at 10:05 AM, Tomasz Gawęda
>>>>>>>>>> >> >>> >>> > <tomasz.gaweda@outlook.com> wrote:
>>>>>>>>>> >> >>> >>> >> Maybe my mail was not clear enough.
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> I didn't want to write "lets focus on Flink" or any
>>>>>>>>>> other
>>>>>>>>>> >> >>> >>> >> framework.
>>>>>>>>>> >> >>> >>> >> The
>>>>>>>>>> >> >>> >>> >> idea with benchmarks was to show two things:
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> - why some people are doing bad PR for Spark
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> - how - in easy way - we can change it and show
>>>>>>>>>> that Spark is
>>>>>>>>>> >> >>> >>> >> still on
>>>>>>>>>> >> >>> >>> >> the
>>>>>>>>>> >> >>> >>> >> top
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> No more, no less. Benchmarks will be helpful, but I
>>>>>>>>>> don't think
>>>>>>>>>> >> >>> >>> >> they're the
>>>>>>>>>> >> >>> >>> >> most important thing in Spark :) On the Spark main
>>>>>>>>>> page there
>>>>>>>>>> >> >>> >>> >> is
>>>>>>>>>> >> >>> >>> >> still
>>>>>>>>>> >> >>> >>> >> chart
>>>>>>>>>> >> >>> >>> >> "Spark vs Hadoop". It is important to show that
>>>>>>>>>> framework is
>>>>>>>>>> >> >>> >>> >> not
>>>>>>>>>> >> >>> >>> >> the
>>>>>>>>>> >> >>> >>> >> same
>>>>>>>>>> >> >>> >>> >> Spark with other API, but much faster and
>>>>>>>>>> optimized, comparable
>>>>>>>>>> >> >>> >>> >> or
>>>>>>>>>> >> >>> >>> >> even
>>>>>>>>>> >> >>> >>> >> faster than other frameworks.
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> About real-time streaming, I think it would be just
>>>>>>>>>> good to see
>>>>>>>>>> >> >>> >>> >> it
>>>>>>>>>> >> >>> >>> >> in
>>>>>>>>>> >> >>> >>> >> Spark.
>>>>>>>>>> >> >>> >>> >> I very like current Spark model, but many voices
>>>>>>>>>> that says "we
>>>>>>>>>> >> >>> >>> >> need
>>>>>>>>>> >> >>> >>> >> more" -
>>>>>>>>>> >> >>> >>> >> community should listen also them and try to help
>>>>>>>>>> them. With
>>>>>>>>>> >> >>> >>> >> SIPs
>>>>>>>>>> >> >>> >>> >> it
>>>>>>>>>> >> >>> >>> >> would
>>>>>>>>>> >> >>> >>> >> be easier, I've just posted this example as "thing
>>>>>>>>>> that may be
>>>>>>>>>> >> >>> >>> >> changed
>>>>>>>>>> >> >>> >>> >> with
>>>>>>>>>> >> >>> >>> >> SIP".
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> I very like unification via Datasets, but there is
>>>>>>>>>> a lot of
>>>>>>>>>> >> >>> >>> >> algorithms
>>>>>>>>>> >> >>> >>> >> inside - let's make easy API, but with strong
>>>>>>>>>> background
>>>>>>>>>> >> >>> >>> >> (articles,
>>>>>>>>>> >> >>> >>> >> benchmarks, descriptions, etc) that shows that
>>>>>>>>>> Spark is still
>>>>>>>>>> >> >>> >>> >> modern
>>>>>>>>>> >> >>> >>> >> framework.
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> Maybe now my intention will be clearer :) As I said
>>>>>>>>>> >> >>> >>> >> organizational
>>>>>>>>>> >> >>> >>> >> ideas
>>>>>>>>>> >> >>> >>> >> were already mentioned and I agree with them, my
>>>>>>>>>> mail was just
>>>>>>>>>> >> >>> >>> >> to
>>>>>>>>>> >> >>> >>> >> show
>>>>>>>>>> >> >>> >>> >> some
>>>>>>>>>> >> >>> >>> >> aspects from my side, so from theside of developer
>>>>>>>>>> and person
>>>>>>>>>> >> >>> >>> >> who
>>>>>>>>>> >> >>> >>> >> is
>>>>>>>>>> >> >>> >>> >> trying
>>>>>>>>>> >> >>> >>> >> to help others with Spark (via StackOverflow or
>>>>>>>>>> other ways)
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> Pozdrawiam / Best regards,
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> Tomasz
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> ________________________________
>>>>>>>>>> >> >>> >>> >> Od: Cody Koeninger <cody@koeninger.org>
>>>>>>>>>> >> >>> >>> >> Wysłane: 17 października 2016 16:46
>>>>>>>>>> >> >>> >>> >> Do: Debasish Das
>>>>>>>>>> >> >>> >>> >> DW: Tomasz Gawęda; dev@spark.apache.org
>>>>>>>>>> >> >>> >>> >> Temat: Re: Spark Improvement Proposals
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> I think narrowly focusing on Flink or benchmarks is
>>>>>>>>>> missing my
>>>>>>>>>> >> >>> >>> >> point.
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> My point is evolve or die.  Spark's governance and
>>>>>>>>>> organization
>>>>>>>>>> >> >>> >>> >> is
>>>>>>>>>> >> >>> >>> >> hampering its ability to evolve technologically,
>>>>>>>>>> and it needs
>>>>>>>>>> >> >>> >>> >> to
>>>>>>>>>> >> >>> >>> >> change.
>>>>>>>>>> >> >>> >>> >>
>>>>>>>>>> >> >>> >>> >> On Sun, Oct 16, 2016 at 9:21 PM, Debasish Das
>>>>>>>>>> >> >>> >>> >> <debasish.das83@gmail.com>
>>>>>>>>>> >> >>> >>> >> wrote:
>>>>>>>>>> >> >>> >>> >>> Thanks Cody for bringing up a valid point...I
>>>>>>>>>> picked up Spark
>>>>>>>>>> >> >>> >>> >>> in
>>>>>>>>>> >> >>> >>> >>> 2014
>>>>>>>>>> >> >>> >>> >>> as
>>>>>>>>>> >> >>> >>> >>> soon as I looked into it since compared to writing
>>>>>>>>>> Java
>>>>>>>>>> >> >>> >>> >>> map-reduce
>>>>>>>>>> >> >>> >>> >>> and
>>>>>>>>>> >> >>> >>> >>> Cascading code, Spark made writing distributed
>>>>>>>>>> code fun...But
>>>>>>>>>> >> >>> >>> >>> now
>>>>>>>>>> >> >>> >>> >>> as
>>>>>>>>>> >> >>> >>> >>> we
>>>>>>>>>> >> >>> >>> >>> went
>>>>>>>>>> >> >>> >>> >>> deeper with Spark and real-time streaming use-case
>>>>>>>>>> gets more
>>>>>>>>>> >> >>> >>> >>> prominent, I
>>>>>>>>>> >> >>> >>> >>> think it is time to bring a messaging model in
>>>>>>>>>> conjunction
>>>>>>>>>> >> >>> >>> >>> with
>>>>>>>>>> >> >>> >>> >>> the
>>>>>>>>>> >> >>> >>> >>> batch/micro-batch API that Spark is good
>>>>>>>>>> at....akka-streams
>>>>>>>>>> >> >>> >>> >>> close
>>>>>>>>>> >> >>> >>> >>> integration with spark micro-batching APIs looks
>>>>>>>>>> like a great
>>>>>>>>>> >> >>> >>> >>> direction to
>>>>>>>>>> >> >>> >>> >>> stay in the game with Apache Flink...Spark 2.0
>>>>>>>>>> integrated
>>>>>>>>>> >> >>> >>> >>> streaming
>>>>>>>>>> >> >>> >>> >>> with
>>>>>>>>>> >> >>> >>> >>> batch with the assumption is that micro-batching
>>>>>>>>>> is sufficient
>>>>>>>>>> >> >>> >>> >>> to
>>>>>>>>>> >> >>> >>> >>> run
>>>>>>>>>> >> >>> >>> >>> SQL
>>>>>>>>>> >> >>> >>> >>> commands on stream but do we really have time to
>>>>>>>>>> do SQL
>>>>>>>>>> >> >>> >>> >>> processing at
>>>>>>>>>> >> >>> >>> >>> streaming data within 1-2 seconds ?
>>>>>>>>>> >> >>> >>> >>>
>>>>>>>>>> >> >>> >>> >>> After reading the email chain, I started to look
>>>>>>>>>> into Flink
>>>>>>>>>> >> >>> >>> >>> documentation
>>>>>>>>>> >> >>> >>> >>> and if you compare it with Spark documentation, I
>>>>>>>>>> think we
>>>>>>>>>> >> >>> >>> >>> have
>>>>>>>>>> >> >>> >>> >>> major
>>>>>>>>>> >> >>> >>> >>> work
>>>>>>>>>> >> >>> >>> >>> to do detailing out Spark internals so that more
>>>>>>>>>> people from
>>>>>>>>>> >> >>> >>> >>> community
>>>>>>>>>> >> >>> >>> >>> start
>>>>>>>>>> >> >>> >>> >>> to take active role in improving the issues so
>>>>>>>>>> that Spark
>>>>>>>>>> >> >>> >>> >>> stays
>>>>>>>>>> >> >>> >>> >>> strong
>>>>>>>>>> >> >>> >>> >>> compared to Flink.
>>>>>>>>>> >> >>> >>> >>>
>>>>>>>>>> >> >>> >>> >>>
>>>>>>>>>> >> >>> >>> >>> https://cwiki.apache.org/confl
>>>>>>>>>> uence/display/SPARK/Spark+Internals
>>>>>>>>>> >> >>> >>> >>>
>>>>>>>>>> >> >>> >>> >>>
>>>>>>>>>> >> >>> >>> >>> https://cwiki.apache.org/confl
>>>>>>>>>> uence/display/FLINK/Flink+Internals
>>>>>>>>>> >> >>> >>> >>>
>>>>>>>>>> >> >>> >>> >>> Spark is no longer an engine that works for
>>>>>>>>>> micro-batch and
>>>>>>>>>> >> >>> >>> >>> batch...We
>>>>>>>>>> >> >>> >>> >>> (and
>>>>>>>>>> >> >>> >>> >>> I am sure many others) are pushing spark as an
>>>>>>>>>> engine for
>>>>>>>>>> >> >>> >>> >>> stream
>>>>>>>>>> >> >>> >>> >>> and
>>>>>>>>>> >> >>> >>> >>> query
>>>>>>>>>> >> >>> >>> >>> processing.....we need to make it a
>>>>>>>>>> state-of-the-art engine
>>>>>>>>>> >> >>> >>> >>> for
>>>>>>>>>> >> >>> >>> >>> high
>>>>>>>>>> >> >>> >>> >>> speed
>>>>>>>>>> >> >>> >>> >>> streaming data and user queries as well !
>>>>>>>>>> >> >>> >>> >>>
>>>>>>>>>> >> >>> >>> >>> On Sun, Oct 16, 2016 at 1:30 PM, Tomasz Gawęda
>>>>>>>>>> >> >>> >>> >>> <tomasz.gaweda@outlook.com>
>>>>>>>>>> >> >>> >>> >>> wrote:
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> Hi everyone,
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> I'm quite late with my answer, but I think my
>>>>>>>>>> suggestions may
>>>>>>>>>> >> >>> >>> >>>> help a
>>>>>>>>>> >> >>> >>> >>>> little bit. :) Many technical and organizational
>>>>>>>>>> topics were
>>>>>>>>>> >> >>> >>> >>>> mentioned,
>>>>>>>>>> >> >>> >>> >>>> but I want to focus on these negative posts about
>>>>>>>>>> Spark and
>>>>>>>>>> >> >>> >>> >>>> about
>>>>>>>>>> >> >>> >>> >>>> "haters"
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> I really like Spark. Easy of use, speed, very
>>>>>>>>>> good community
>>>>>>>>>> >> >>> >>> >>>> -
>>>>>>>>>> >> >>> >>> >>>> it's
>>>>>>>>>> >> >>> >>> >>>> everything here. But Every project has to
>>>>>>>>>> "flight" on
>>>>>>>>>> >> >>> >>> >>>> "framework
>>>>>>>>>> >> >>> >>> >>>> market"
>>>>>>>>>> >> >>> >>> >>>> to be still no 1. I'm following many Spark and
>>>>>>>>>> Big Data
>>>>>>>>>> >> >>> >>> >>>> communities,
>>>>>>>>>> >> >>> >>> >>>> maybe my mail will inspire someone :)
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> You (every Spark developer; so far I didn't have
>>>>>>>>>> enough time
>>>>>>>>>> >> >>> >>> >>>> to
>>>>>>>>>> >> >>> >>> >>>> join
>>>>>>>>>> >> >>> >>> >>>> contributing to Spark) has done excellent job. So
>>>>>>>>>> why are
>>>>>>>>>> >> >>> >>> >>>> some
>>>>>>>>>> >> >>> >>> >>>> people
>>>>>>>>>> >> >>> >>> >>>> saying that Flink (or other framework) is better,
>>>>>>>>>> like it was
>>>>>>>>>> >> >>> >>> >>>> posted
>>>>>>>>>> >> >>> >>> >>>> in
>>>>>>>>>> >> >>> >>> >>>> this mailing list? No, not because that framework
>>>>>>>>>> is better
>>>>>>>>>> >> >>> >>> >>>> in
>>>>>>>>>> >> >>> >>> >>>> all
>>>>>>>>>> >> >>> >>> >>>> cases.. In my opinion, many of these discussions
>>>>>>>>>> where
>>>>>>>>>> >> >>> >>> >>>> started
>>>>>>>>>> >> >>> >>> >>>> after
>>>>>>>>>> >> >>> >>> >>>> Flink marketing-like posts. Please look at
>>>>>>>>>> StackOverflow
>>>>>>>>>> >> >>> >>> >>>> "Flink
>>>>>>>>>> >> >>> >>> >>>> vs
>>>>>>>>>> >> >>> >>> >>>> ...."
>>>>>>>>>> >> >>> >>> >>>> posts, almost every post in "winned" by Flink.
>>>>>>>>>> Answers are
>>>>>>>>>> >> >>> >>> >>>> sometimes
>>>>>>>>>> >> >>> >>> >>>> saying nothing about other frameworks, Flink's
>>>>>>>>>> users (often
>>>>>>>>>> >> >>> >>> >>>> PMC's)
>>>>>>>>>> >> >>> >>> >>>> are
>>>>>>>>>> >> >>> >>> >>>> just posting same information about real-time
>>>>>>>>>> streaming,
>>>>>>>>>> >> >>> >>> >>>> about
>>>>>>>>>> >> >>> >>> >>>> delta
>>>>>>>>>> >> >>> >>> >>>> iterations, etc. It look smart and very often it
>>>>>>>>>> is marked as
>>>>>>>>>> >> >>> >>> >>>> an
>>>>>>>>>> >> >>> >>> >>>> aswer,
>>>>>>>>>> >> >>> >>> >>>> even if - in my opinion - there wasn't told all
>>>>>>>>>> the truth.
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> My suggestion: I don't have enough money and
>>>>>>>>>> knowledgle to
>>>>>>>>>> >> >>> >>> >>>> perform
>>>>>>>>>> >> >>> >>> >>>> huge
>>>>>>>>>> >> >>> >>> >>>> performance test. Maybe some company, that
>>>>>>>>>> supports Spark
>>>>>>>>>> >> >>> >>> >>>> (Databricks,
>>>>>>>>>> >> >>> >>> >>>> Cloudera? - just saying you're most visible in
>>>>>>>>>> community :) )
>>>>>>>>>> >> >>> >>> >>>> could
>>>>>>>>>> >> >>> >>> >>>> perform performance test of:
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> - streaming engine - probably Spark will loose
>>>>>>>>>> because of
>>>>>>>>>> >> >>> >>> >>>> mini-batch
>>>>>>>>>> >> >>> >>> >>>> model, however currently the difference should be
>>>>>>>>>> much lower
>>>>>>>>>> >> >>> >>> >>>> that in
>>>>>>>>>> >> >>> >>> >>>> previous versions
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> - Machine Learning models
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> - batch jobs
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> - Graph jobs
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> - SQL queries
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> People will see that Spark is envolving and is
>>>>>>>>>> also a modern
>>>>>>>>>> >> >>> >>> >>>> framework,
>>>>>>>>>> >> >>> >>> >>>> because after reading posts mentioned above
>>>>>>>>>> people may think
>>>>>>>>>> >> >>> >>> >>>> "it
>>>>>>>>>> >> >>> >>> >>>> is
>>>>>>>>>> >> >>> >>> >>>> outdated, future is in framework X".
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> Matei Zaharia posted excellent blog post about
>>>>>>>>>> how Spark
>>>>>>>>>> >> >>> >>> >>>> Structured
>>>>>>>>>> >> >>> >>> >>>> Streaming beats every other framework in terms of
>>>>>>>>>> easy-of-use
>>>>>>>>>> >> >>> >>> >>>> and
>>>>>>>>>> >> >>> >>> >>>> reliability. Performance tests, done in various
>>>>>>>>>> environments
>>>>>>>>>> >> >>> >>> >>>> (in
>>>>>>>>>> >> >>> >>> >>>> example: laptop, small 2 node cluster, 10-node
>>>>>>>>>> cluster,
>>>>>>>>>> >> >>> >>> >>>> 20-node
>>>>>>>>>> >> >>> >>> >>>> cluster), could be also very good marketing stuff
>>>>>>>>>> to say
>>>>>>>>>> >> >>> >>> >>>> "hey,
>>>>>>>>>> >> >>> >>> >>>> you're
>>>>>>>>>> >> >>> >>> >>>> telling that you're better, but Spark is still
>>>>>>>>>> faster and is
>>>>>>>>>> >> >>> >>> >>>> still
>>>>>>>>>> >> >>> >>> >>>> getting even more fast!". This would be based on
>>>>>>>>>> facts (just
>>>>>>>>>> >> >>> >>> >>>> numbers),
>>>>>>>>>> >> >>> >>> >>>> not opinions. It would be good for companies, for
>>>>>>>>>> marketing
>>>>>>>>>> >> >>> >>> >>>> puproses
>>>>>>>>>> >> >>> >>> >>>> and
>>>>>>>>>> >> >>> >>> >>>> for every Spark developer
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> Second: real-time streaming. I've written some
>>>>>>>>>> time ago about
>>>>>>>>>> >> >>> >>> >>>> real-time
>>>>>>>>>> >> >>> >>> >>>> streaming support in Spark Structured Streaming.
>>>>>>>>>> Some work
>>>>>>>>>> >> >>> >>> >>>> should be
>>>>>>>>>> >> >>> >>> >>>> done to make SSS more low-latency, but I think
>>>>>>>>>> it's possible.
>>>>>>>>>> >> >>> >>> >>>> Maybe
>>>>>>>>>> >> >>> >>> >>>> Spark may look at Gearpump, which is also built
>>>>>>>>>> on top of
>>>>>>>>>> >> >>> >>> >>>> Akka?
>>>>>>>>>> >> >>> >>> >>>> I
>>>>>>>>>> >> >>> >>> >>>> don't
>>>>>>>>>> >> >>> >>> >>>> know yet, it is good topic for SIP. However I
>>>>>>>>>> think that
>>>>>>>>>> >> >>> >>> >>>> Spark
>>>>>>>>>> >> >>> >>> >>>> should
>>>>>>>>>> >> >>> >>> >>>> have real-time streaming support. Currently I see
>>>>>>>>>> many
>>>>>>>>>> >> >>> >>> >>>> posts/comments
>>>>>>>>>> >> >>> >>> >>>> that "Spark has too big latency". Spark Streaming
>>>>>>>>>> is doing
>>>>>>>>>> >> >>> >>> >>>> very
>>>>>>>>>> >> >>> >>> >>>> good
>>>>>>>>>> >> >>> >>> >>>> jobs with micro-batches, however I think it is
>>>>>>>>>> possible to
>>>>>>>>>> >> >>> >>> >>>> add
>>>>>>>>>> >> >>> >>> >>>> also
>>>>>>>>>> >> >>> >>> >>>> more
>>>>>>>>>> >> >>> >>> >>>> real-time processing.
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> Other people said much more and I agree with
>>>>>>>>>> proposal of SIP.
>>>>>>>>>> >> >>> >>> >>>> I'm
>>>>>>>>>> >> >>> >>> >>>> also
>>>>>>>>>> >> >>> >>> >>>> happy that PMC's are not saying that they will
>>>>>>>>>> not listen to
>>>>>>>>>> >> >>> >>> >>>> users,
>>>>>>>>>> >> >>> >>> >>>> but
>>>>>>>>>> >> >>> >>> >>>> they really want to make Spark better for every
>>>>>>>>>> user.
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> What do you think about these two topics?
>>>>>>>>>> Especially I'm
>>>>>>>>>> >> >>> >>> >>>> looking
>>>>>>>>>> >> >>> >>> >>>> at
>>>>>>>>>> >> >>> >>> >>>> Cody
>>>>>>>>>> >> >>> >>> >>>> (who has started this topic) and PMCs :)
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> Pozdrawiam / Best regards,
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>> Tomasz
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>> >>>>
>>>>>>>>>> >> >>> >>>
>>>>>>>>>> >> >>> >>
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >>> >
>>>>>>>>>> >> >
>>>>>>>>>> >> >
>>>>>>>>>> >>
>>>>>>>>>> >> ------------------------------------------------------------
>>>>>>>>>> ---------
>>>>>>>>>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>>> >>
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>> > Ryan Blue
>>>>>>>>>> > Software Engineer
>>>>>>>>>> > Netflix
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>> ---------
>>>>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Joseph Bradley
>>>>>>>>>
>>>>>>>>> Software Engineer - Machine Learning
>>>>>>>>>
>>>>>>>>> Databricks, Inc.
>>>>>>>>>
>>>>>>>>> [image: http://databricks.com] <http://databricks.com/>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message