flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Austin Ouyang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-1284) Uniform random sampling operator over windows
Date Fri, 22 Apr 2016 16:41:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253383#comment-15253383
] 

Austin Ouyang edited comment on FLINK-1284 at 4/22/16 4:40 PM:
---------------------------------------------------------------

Hi [~senorcarbone],

Would we also want to add the ability to sample by percentage? Also what would the fieldID
be referring to? I was thinking that there were 2 naive possible solutions. 
1) Once the trigger is made, we randomly sample for N samples or a percentage of all the samples
in each window
2) Given a percentage of samples we want to retain from each window generate a random number
between 0 and 1. Append to result if the random number is less than the specified percentage.


I'd be happy to try working on this as well!


was (Author: aouyang1):
Hi Paris,

Would we also want to add the ability to sample by percentage? Also what would the fieldID
be referring to? I was thinking that there were 2 naive possible solutions. 
1) Once the trigger is made, we randomly sample for N samples or a percentage of all the samples
in each window
2) Given a percentage of samples we want to retain from each window generate a random number
between 0 and 1. Append to result if the random number is less than the specified percentage.



> Uniform random sampling operator over windows
> ---------------------------------------------
>
>                 Key: FLINK-1284
>                 URL: https://issues.apache.org/jira/browse/FLINK-1284
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Paris Carbone
>            Priority: Minor
>
> It would be useful for several use cases to have a built-in uniform random sampling operator
in the streaming API that can operate on windows. This can be used for example for online
machine learning operations, evaluating heuristics or continuous visualisation of representative
values.
> The operator could be given a field and a number of random samples needed, following
a window statement as such:
> mystream.window(..).sample(fieldID,#samples)
> Given that pre-aggregation is enabled, this could perhaps be implemented as a binary
reduce operator or a combinable groupreduce that pre-aggregates the empiricals of that field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message