lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Kelvin <george.kelvin...@gmail.com>
Subject Re: Questions about FuzzyQuery in Lucene 4.x
Date Thu, 31 Jan 2013 19:52:53 GMT
Hi Jack, sorry for confusing you. I understand that it would be great if a
minimal data set can be provided to repro the problem. But I was unable to
do that..

Hi Michael,

Thank you! That is the problem! I changed the maxExpansions to 100 and the
results are found.

About my second question, the ranking of wildcard fuzzy search, can you
also give some suggestions? Thanks!

George


On Wed, Jan 30, 2013 at 10:05 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Tue, Jan 29, 2013 at 2:43 PM, George Kelvin
> <george.kelvin738@gmail.com> wrote:
> > Hi Jack,
> >
> > The problematic query is "scar"+"wads".
> >
> > There are several (more than 10) documents in the data with the content
> > "star wars", so I think that query should be able to find all these
> > documents.
> >
> > I was trying to provide a minimal test case, but I couldn't reduce the
> size
> > of data showing the failure.
> >
> > The size of the minimal data showing the failure I got so far is around 2
> > million.
> >
> > However, I found a suspicious document with content "scor". If I remove
> it
> > from the 2 million documents data, that query can find all the "star
> wars"
> > documents. If I add it back, then the query can't find any.
>
> Hmm, maybe try increasing the maxExpansions (one of FuzzyQ's ctors take
> that).
>
> By default it's 50, meaning we enumerate the top 50 terms within edit
> distance 1, so it could be "star" is falling out of the top 50?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message