lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Query with exact number of tokens
Date Fri, 21 Sep 2018 14:18:05 GMT
I think you can match everything in the query to the field using either
1) disMax/eDisMax with mm=100%
https://lucene.apache.org/solr/guide/7_4/the-dismax-query-parser.html#mm-minimum-should-match-parameter
2) Complex Phrase Query Parser with inOrder=false:
https://lucene.apache.org/solr/guide/7_4/other-parsers.html#complex-phrase-query-parser

The number of tokens though is hard. You only know what your tokens
are at the end of the indexing pipeline. And during search, the tokens
are looked up from their indexes and only then the documents are
looked up.

You may be able to do this with custom Postfilter that would run after
everything else to just reject records with extra tokens. That would
not be too expensive.

Or (possibly simpler way) you could try to precalculate things, by
writing a custom TokenFilter that takes a stream and returns token
count to be used as a copyField target. Then you send your query to
the same field with any full-query preserving syntax, either as a
phrase or as a field query parser:
https://lucene.apache.org/solr/guide/7_4/other-parsers.html#complex-phrase-query-parser

I would love to know if any/all of this works for you.

Regards,
   Alex.

On 21 September 2018 at 09:00, marotosg <marotosg@gmail.com> wrote:
> Hi,
>
> I have to search for company names where my first requirement is to find
> only exact matches on the company name.
>
> For instance if I search for "CENTURY BANCORP, INC." I shouldn't find "NEW
> CENTURY BANCORP, INC."
> because the result company has the extra keyword "NEW".
>
> I can't use exact match because the sequence of tokens may differ. Basically
> I need to find results where the  tokens are the same in any order and the
> number of tokens match.
>
> I have no idea if it's possible as include in the query the number of tokens
> and solr field has that info within to match it.
>
> Thanks for your help
> Sergio
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Mime
View raw message