spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Spark 2.0
Date Mon, 25 Jul 2016 20:57:35 GMT
Hi Bryan,

Excellent questions about the upcoming 2.0! Took me a while to find
the answer about structured streaming.

Seen http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/structured-streaming-programming-guide.html#window-operations-on-event-time
? That may be relevant to your question 2.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Jul 25, 2016 at 8:23 PM, Bryan Jeffrey <bryan.jeffrey@gmail.com> wrote:
> All,
>
> I had three questions:
>
> (1) Is there a timeline for stable Spark 2.0 release?  I know the 'preview'
> build is out there, but was curious what the timeline was for full release.
> Jira seems to indicate that there should be a release 7/27.
>
> (2)  For 'continuous' datasets there has been a lot of discussion. One item
> that came up in tickets was the idea that 'count()' and other functions do
> not apply to continuous datasets:
> https://github.com/apache/spark/pull/12080.  In this case what is the
> intended procedure to calculate a streaming statistic based on an interval
> (e.g. count the number of records in a 2 minute window every 2 minutes)?
>
> (3) In previous releases (1.6.1) the call to DStream / RDD repartition w/ a
> number of partitions set to zero silently deletes data.  I have looked in
> Jira for a similar issue, but I do not see one.  I would like to address
> this (and would likely be willing to go fix it myself).  Should I just
> create a ticket?
>
> Thank you,
>
> Bryan Jeffrey
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message