lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@heliosearch.com>
Subject Re: Query with many clauses
Date Wed, 29 Oct 2014 14:00:10 GMT
For queries with many terms, where each term matches few documents
(actually a single document for "ID filters" in my tests), I saw
speedups between 4x and 8x
http://heliosearch.org/solr-terms-query/  (the 3rd chart)

-Yonik
http://heliosearch.org - native code faceting, facet functions,
sub-facets, off-heap data


On Wed, Oct 29, 2014 at 9:42 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> I suggested TermsFilter, not TermFilter :)  Note the sneaky extra s ....
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Oct 29, 2014 at 8:20 AM, Pawel Rog <pawelrog88@gmail.com> wrote:
>> Hi,
>> I already tried to transform Queries to filter (TermQuery -> TermFilter)
>> but didn't see much speed up. I wrote that  wrapped this filter into
>> ConstantScoreQuery and in other test I used FilteredQuery with
>> MatchAllDocsQuery and BooleanFilter. Both cases seems to work quite similar
>> in terms of performance to simple BooleanQuery.
>> But of course I'll also try to use TermsFilter. Maybe it will speedUp
>> filters.
>>
>> Michael Sokolov I haven't prepared any statistics about number of
>> BooleanClauses used and if there are some repeating sets of terms. I think
>> I have to collect some stats for better understanding what can be improved.
>>
>> --
>> Paweł Róg
>>
>>
>> On Wed, Oct 29, 2014 at 12:30 PM, Michael Sokolov <
>> msokolov@safaribooksonline.com> wrote:
>>
>>> I'm curious to know more about your use case, because I have an idea for
>>> something that addresses this, but haven't found the opportunity to develop
>>> it yet - maybe somebody else wants to :).  The basic idea is to reduce the
>>> number of terms needed to be looked up by collapsing commonly-occurring
>>> collections of terms into synthetic "tiles".  If your queries have a lot of
>>> overlap, this could greatly reduce the number of terms in a query rewritten
>>> to use tiles. It's sort of complex, requires indexing support, or a filter
>>> cache, and there's no working implementation as yet, so this is probably
>>> not really going to be helpful for you in the short term, but if you can
>>> share some information I'd love to know:
>>>
>>> what kind of things are you searching?
>>> how many terms do your larger queries have?
>>> do the query terms overlap among your queries?
>>>
>>> -Mike Sokolov
>>>
>>>
>>> On 10/28/14 9:40 PM, Pawel Rog wrote:
>>>
>>>> Hi,
>>>> I have to run query with a lot of boolean should clauses. Queries like
>>>> these were of course slow so I decided to change query to filter wrapped
>>>> by
>>>> ConstantScoreQuery but it also didn't help.
>>>>
>>>> Profiler shows that most of the time is spent on seekExact in
>>>> BlockTreeTermsReader$FieldReader$SegmentTermsEnum
>>>>
>>>> When I go deeper in trace I see that inside seekExact most time is spent
>>>> on
>>>> loadBlock and even deeper ByteBufferIndexInput.clone.
>>>>
>>>> Do you have any ideas how I can make it work faster or it is not possible
>>>> and I have to live with it?
>>>>
>>>> --
>>>> Paweł Róg
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message