lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions
Date Mon, 02 Jan 2017 02:16:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15792021#comment-15792021
] 

Joel Bernstein edited comment on SOLR-8530 at 1/2/17 2:16 AM:
--------------------------------------------------------------

I returned to the HavingStream as part of SOLR-8593.

What I found during the implementation is that both implementations described in this ticket
are compatible in the same HavingStream implementation. 

What [~dpgove] originally described was indexing a document on the fly and the using a Lucene/Solr
query to implement the boolean logic.

What I described is implementing the boolean logic as stream operations that would handle
typical SQL Having comparisons (=, <, >, <>, >=, <=). 

I have  implemented the HavingStream I described as part of SOLR-8593 with syntax that looks
like this:

{code}
having(expr, booleanOp)
{code}

Where booleanOp is a new type of operation that returns *TRUE* or *FALSE* for each tuple.
The basic boolean operations have been implemented, such as:

{code}
having(expr, and(gt(field1, 5), lt(field1, 10)))
{code}

This would emit tuples from the underlying expr where field1 is greater the 5 and less then
10.

To implement what [~dpgove] had in mind, we can add a new boolean operation called *match*.
The match operation will index the tuple in a in-memory index and the match a Lucene/Solr
query against it. Here is the sample syntax:

{code}
having(expr, match("field1:[5 TO 10]"))
{code}

The match boolean operation could then be intermingled with other boolean operations, for
example:

{code}
having(expr, and(gt(field2, 8), match("body:(hello world)")))
{code}

Depending on the progress of the SOLR-8593, I may strip out the HavingStream implementation
and commit it with this ticket, so it can be ready for Solr 6.4.







was (Author: joel.bernstein):
I returned the HavingStream as part of SOLR-8593.

What I found during the implementation is that both implementations described in this ticket
are compatible in the same HavingStream implementation. 

What [~dpgove] originally described was indexing a document on the fly and the using a Lucene/Solr
query to implement the boolean logic.

What I described is implementing the boolean logic as stream operations that would handle
typical SQL Having comparisons (=, <, >, <>, >=, <=). 

I have  implemented the HavingStream I described as part of SOLR-8593 with syntax that looks
like this:

{code}
having(expr, booleanOp)
{code}

Where booleanOp is a new type of operation that returns *TRUE* or *FALSE* for each tuple.
The basic boolean operations have been implemented, such as:

{code}
having(expr, and(gt(field1, 5), lt(field1, 10)))
{code}

This would emit tuples from the underlying expr where field1 is greater the 5 and less then
10.

To implement what [~dpgove] had in mind, we can add a new boolean operation called *match*.
The match operation will index the tuple in a in-memory index and the match a Lucene/Solr
query against it. Here is the sample syntax:

{code}
having(expr, match("field1:[5 TO 10]"))
{code}

The match boolean operation could then be intermingled with other boolean operations, for
example:

{code}
having(expr, and(gt(field2, 8), match("body:(hello world)")))
{code}

Depending on the progress of the SOLR-8593, I may strip out the HavingStream implementation
and commit it with this ticket, so it can be ready for Solr 6.4.






> Add HavingStream to Streaming API and StreamingExpressions
> ----------------------------------------------------------
>
>                 Key: SOLR-8530
>                 URL: https://issues.apache.org/jira/browse/SOLR-8530
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrJ
>    Affects Versions: 6.0
>            Reporter: Dennis Gove
>            Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where one can filter
documents based on data that is not available in the index. For example, filter the output
of a reduce(....) based on the calculated metrics.
> {code}
> having(
>   reduce(
>     search(.....),
>     sum(cost),
>     on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer is >=
500. The total spent is calculated via the sum(cost) metric in the reduce stream.
> The intent is to support as the filters in the having(...) clause the full query syntax
of a search(...) clause. I see this being possible in one of two ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying stream creating
an instance of MemoryIndex and apply the query to it. If the result of that is >0 then
the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all tuples into
that in-memory index using the UpdateStream, and then stream out of that all the matching
tuples from the query.
> There are benefits to each approach but I think the easiest and most direct one is the
MemoryIndex approach. With MemoryIndex it isn't necessary to read all incoming tuples before
returning a single tuple. With a MemoryIndex there is a need to parse the solr query parameters
and create a valid Lucene query but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message