lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-9684) Add schedule Streaming Expression
Date Mon, 02 Jan 2017 02:56:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15792043#comment-15792043
] 

Joel Bernstein edited comment on SOLR-9684 at 1/2/17 2:56 AM:
--------------------------------------------------------------

Ok, then let's go with *priority* as the name for this function.

About the *merge* function. The merge function is shorthand for "mergeSort". It's designed
to merge two streams sorted on the same keys and maintain the sort order. Originally the idea
was that the /export handler was a giant sorting engine, and merge was a way to efficiently
merge the sorted streams.

The priority function behaves more like the SQL UNIONALL. But it's different in that *priority*
only picks one stream to iterate on each open/close. This design allows it to iterate the
high priority topic, and only iterate the lower priority topic when no new higher priority
tasks have entered the index. Because topics work in small batches, new high priority tasks
will jump ahead of existing lower priority task on the next executor run.

Also the *merge* function I think fits into the relational algebra category. The *priority*
function is mainly going to be used for task prioritization and execution.

Eventually we'll need to implement both a UnionStream and UnionAllStream as well.




was (Author: joel.bernstein):
Ok, then let's go with *priority* as the name for this function.

About the *merge* function. The merge function is shorthand for "mergeSort". It's designed
to merge two streams sorted on the same keys and maintain the sort order. Originally the idea
was that the /export handler was a giant sorting engine, and merge was a way to efficiently
merge the sorted streams.

The priority function behaves more like the SQL UNIONALL with priority. But it's different
in that *priority* only picks one stream to iterate on each open/close. This design allows
it to iterate the high priority topic until it's empty, and only then iterate through the
lower priority topic.

Also the *merge* function I think fits into the relational algebra category. The *priority*
function is mainly going to be used for task prioritization and execution.

Eventually we'll need to implement both a UnionStream and UnionAllStream as well.



> Add schedule Streaming Expression
> ---------------------------------
>
>                 Key: SOLR-9684
>                 URL: https://issues.apache.org/jira/browse/SOLR-9684
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>             Fix For: master (7.0), 6.4
>
>         Attachments: SOLR-9684.patch, SOLR-9684.patch, SOLR-9684.patch
>
>
> SOLR-9559 adds a general purpose *parallel task executor* for streaming expressions.
The executor() function executes a stream of tasks and doesn't have any concept of task priority.
> The scheduler() function wraps two streams, a high priority stream and a low priority
stream. The scheduler function emits tuples from the high priority stream first, and then
the low priority stream.
> The executor() function can then wrap the scheduler function to see tasks in priority
order.
> Pseudo syntax:
> {code}
> daemon(executor(schedule(topic(tasks, q="priority:high"), topic(tasks, q="priority:low"))))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message