lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander S. (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order
Date Sun, 28 Dec 2014 12:50:13 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259627#comment-14259627
] 

Alexander S. edited comment on SOLR-6494 at 12/28/14 12:50 PM:
---------------------------------------------------------------

As I was told already, Solr does not apply filters incrementally, instead each filter runs
through the entire data set, then Solr caches the results. In the case with filters that contain
ranges cache is not effective, especially when we need NRT search and commits being triggered
multiple times per minute. Then big caches make no sense and big autowarming numbers causing
Solr to fail. My point is that cache is not always efficient and for such cases Solr need
to use another strategy and apply filters incrementally (read as post filters).

So this:
{quote}
By design, fq clauses like this are calculated for the entire document set and the results
cached, there is no "ordering" for that part. Otherwise, how could they be re-used for a different
query?
{quote}
does not work in all cases.

Something like this:
{code}
# cost > 100 to run as a post filter, but something like post=true would be better I think
fq={!cache=false cost=101}field:value
{code}
would definitely solve the problem, but this is not supported.

The frange parser has support for this, but it is not always suitable and fails with different
errors, like "can not use FieldCache on multivalued field: type", etc.

Does that look like a missing feature? I mean for me it definitely does, but could this be
considered as a wish and implemented some day? How can Solr community help with missing features?


was (Author: aheaven):
As I was told already, Solr does not apply filters incrementally, instead each filter runs
through the entire data set, then Solr caches the results. In the case with filters that contain
ranges cache is not effective, especially when we need NRT search and commits being triggered
multiple times per minute. Then big caches make no sense and big autowarming numbers causing
Solr to fail. My point is that cache is not always efficient and for such cases Solr need
to use another strategy and apply filters incrementally (read as post filters).

So this:
{quote}
By design, fq clauses like this are calculated for the entire document set and the results
cached, there is no "ordering" for that part. Otherwise, how could they be re-used for a different
query?
{quote}
does not work in all cases.

Something like this:
{code}
fq={!cache=false cost=101}field:value # to run as a post filter
{code}
would definitely solve the problem, but this is not supported.

The frange parser has support for this, but it is not always suitable and fails with different
errors, like "can not use FieldCache on multivalued field: type", etc.

Does that look like a missing feature? I mean for me it definitely does, but could this be
considered as a wish and implemented some day? How can Solr community help with missing features?

> Query filters applied in a wrong order
> --------------------------------------
>
>                 Key: SOLR-6494
>                 URL: https://issues.apache.org/jira/browse/SOLR-6494
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.8.1
>            Reporter: Alexander S.
>
> This query:
> {code}
> {
>   fq: ["type:Award::Nomination"],
>   sort: "score desc",
>   start: 0,
>   rows: 20,
>   q: "*:*"
> }
> {code}
> takes just a few milliseconds, but this one:
> {code}
> {
>   fq: [
>     "type:Award::Nomination",
>     "created_at_d:[* TO 2014-09-08T23:59:59Z]"
>   ],
>   sort: "score desc",
>   start: 0,
>   rows: 20,
>   q: "*:*"
> }
> {code}
> takes almost 15 seconds.
> I have just ≈12k of documents with type "Award::Nomination", but around half a billion
with created_at_d field set. And it seems Solr applies the created_at_d filter first going
through all documents where this field is set, which is not very smart.
> I think if it can't do anything better than applying filters in the alphabet order it
should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message