spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Rodriguez <ski.rodrig...@gmail.com>
Subject Re: Spark 2.0
Date Mon, 25 Jul 2016 19:31:06 GMT
Spark 2.0 vote for RC5 passed last Friday night so it will probably be
released early this week if I had to guess.

On Mon, Jul 25, 2016 at 12:23 PM, Bryan Jeffrey <bryan.jeffrey@gmail.com>
wrote:

> All,
>
> I had three questions:
>
> (1) Is there a timeline for stable Spark 2.0 release?  I know the
> 'preview' build is out there, but was curious what the timeline was for
> full release. Jira seems to indicate that there should be a release 7/27.
>
> (2)  For 'continuous' datasets there has been a lot of discussion. One
> item that came up in tickets was the idea that 'count()' and other
> functions do not apply to continuous datasets:
> https://github.com/apache/spark/pull/12080.  In this case what is the
> intended procedure to calculate a streaming statistic based on an interval
> (e.g. count the number of records in a 2 minute window every 2 minutes)?
>
> (3) In previous releases (1.6.1) the call to DStream / RDD repartition w/
> a number of partitions set to zero silently deletes data.  I have looked in
> Jira for a similar issue, but I do not see one.  I would like to address
> this (and would likely be willing to go fix it myself).  Should I just
> create a ticket?
>
> Thank you,
>
> Bryan Jeffrey
>
>


-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Mime
View raw message