beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ismaël Mejía (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (BEAM-2409) Spark runner produces exactly twice the number of results in streaming mode when use triggers to re-window results on global window.
Date Mon, 05 Jun 2017 14:30:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037035#comment-16037035
] 

Ismaël Mejía edited comment on BEAM-2409 at 6/5/17 2:29 PM:
------------------------------------------------------------

Don't know if related but a comment by [~baibaichen] on Beam's slack gchannel seems related:

??I noticed that LateDataUtils.dropExpiredWindows() return FluentIterable, and the result
is passed to ReduceFnRunner.processElements(). this iterator will at least iterate twice,
it's better to use FluentIterable#toList to avoid the extra iteration.??


was (Author: iemejia):
Don't know if related but a comment by [~baibaichen] on Beam's slack gchannel seems related:

??
I noticed that LateDataUtils.dropExpiredWindows() return FluentIterable, and the result is
passed to ReduceFnRunner.processElements(). this iterator will at least iterate twice, it's
better to use FluentIterable#toList to avoid the extra iteration.
??

> Spark runner produces exactly twice the number of results in streaming mode when use
triggers to re-window results on global window.
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-2409
>                 URL: https://issues.apache.org/jira/browse/BEAM-2409
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>    Affects Versions: 2.0.0
>            Reporter: Ismaël Mejía
>            Assignee: Aviem Zur
>
> This can be tested with Nexmark query 6. Sorry I don’t have a smaller test case than
this, but I think the part of the pipeline that produces the result is this one.
> {code:java}
>         .apply(
>             Window.<KV<Long, Bid>>into(new GlobalWindows())
>                 .triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
>                 .accumulatingFiredPanes()
>                 .withAllowedLateness(Duration.ZERO))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message