spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API
Date Thu, 01 Dec 2016 23:51:01 GMT
Can you submit a pull request with test cases based on that change?


On Dec 1, 2016, 9:39 AM -0800, Maciej Szymkiewicz <mszymkiewicz@gmail.com>, wrote:
> This doesn't affect that. The only concern is what we consider to UNBOUNDED on Python
side.
>
> On 12/01/2016 07:56 AM, assaf.mendelson wrote:
> > I may be mistaken but if I remember correctly spark behaves differently when it
is bounded in the past and when it is not. Specifically I seem to recall a fix which made
sure that when there is no lower bound then the aggregation is done one by one instead of
doing the whole range for each window. So I believe it should be configured exactly the same
as in scala/java so the optimization would take place.
> > Assaf.
> >
> > From: rxin [via Apache Spark Developers List] [mailto:ml-node+[hidden email]]
> > Sent: Wednesday, November 30, 2016 8:35 PM
> > To: Mendelson, Assaf
> > Subject: Re: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame
boundary API
> >
> > Yes I'd define unboundedPreceding to -sys.maxsize, but also any value less than
min(-sys.maxsize, _JAVA_MIN_LONG) are considered unboundedPreceding too. We need to be careful
with long overflow when transferring data over to Java.
> >
> >
> > On Wed, Nov 30, 2016 at 10:04 AM, Maciej Szymkiewicz <[hidden email]> wrote:
> > It is platform specific so theoretically can be larger, but 2**63 - 1 is a standard
on 64 bit platform and 2**31 - 1 on 32bit platform. I can submit a patch but I am not sure
how to proceed. Personally I would set
> >
> > unboundedPreceding = -sys.maxsize
> >
> > unboundedFollowing = sys.maxsize
> > to keep backwards compatibility.
> > On 11/30/2016 06:52 PM, Reynold Xin wrote:
> > > Ah ok for some reason when I did the pull request sys.maxsize was much larger
than 2^63. Do you want to submit a patch to fix this?
> > >
> > >
> > > On Wed, Nov 30, 2016 at 9:48 AM, Maciej Szymkiewicz <[hidden email]>
wrote:
> > > The problem is that -(1 << 63) is -(sys.maxsize + 1) so the code which
used to work before is off by one.
> > > On 11/30/2016 06:43 PM, Reynold Xin wrote:
> > > > Can you give a repro? Anything less than -(1 << 63) is considered
negative infinity (i.e. unbounded preceding).
> > > >
> > > > On Wed, Nov 30, 2016 at 8:27 AM, Maciej Szymkiewicz <[hidden email]>
wrote:
> > > > Hi,
> > > >
> > > > I've been looking at the SPARK-17845 and I am curious if there is any
> > > > reason to make it a breaking change. In Spark 2.0 and below we could use:
> > > >
> > > >     Window().partitionBy("foo").orderBy("bar").rowsBetween(-sys.maxsize,
> > > > sys.maxsize))
> > > >
> > > > In 2.1.0 this code will silently produce incorrect results (ROWS BETWEEN
> > > > -1 PRECEDING AND UNBOUNDED FOLLOWING) Couldn't we use
> > > > Window.unboundedPreceding equal -sys.maxsize to ensure backward
> > > > compatibility?
> > > >
> > > > --
> > > >
> > > > Maciej Szymkiewicz
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe e-mail: [hidden email]
> > > >
> > >
> > >
> > > --
> > >
> > > Maciej Szymkiewicz
> > >
> >
> >
> > --
> >
> > Maciej Szymkiewicz
> >
> >
> > If you reply to this email, your message will be added to the discussion below:
> > http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-17845-SQL-PYTHON-More-self-evident-window-function-frame-boundary-API-tp20064p20069.html
> > To start a new topic under Apache Spark Developers List, email [hidden email]
> > To unsubscribe from Apache Spark Developers List, click here.
> > NAML
> >
> > View this message in context: RE: [SPARK-17845] [SQL][PYTHON] More self-evident
window function frame boundary API
> > Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
>
> --
> Maciej Szymkiewicz

Mime
View raw message