lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <benedetti.ale...@gmail.com>
Subject Re: Proximity searching in percentage
Date Fri, 08 May 2015 08:53:34 GMT
Hi Zheng,
actually that version of the fuzzy search is deprecated!
Currently the fuzzy search syntax is :
<query>~1 or <query>~2
The ~(tilde)  param is the number of edit we provide to generate all the
expanded query to run.
Can I ask you which version of Solr are you using ?

This article from 2011 shows the biggest change in fuzzy query, and I guess
it's still the current approach!
Related the performance, what do you mean ?
Are you worried if the length check will affect the query time ?
The answer is yes, but the delay will be un-noticeable as you simply check
the length and apply the proper fuzzy param related.
Regarding the fact fuzzy query being slower than a normal query, that is
true, but the FST approach guarantee really fast fuzzy query.
So if you do need the fuzziness, it's something you can cope with.

Cheers

2015-05-08 3:12 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:

> Thank you for the information.
>
> I've currently using the fuzzy search and set the edit distance value to
> ~0.79, and this has allowed a 20% error rate. (ie for words with 5
> characters, it allows 1 mis-spelled character, and for words with 10
> characters, it allows 2 mis-speed characters).
>
> However, for words with 4 characters, I'll need to set the value to ~0.75
> to allow 1 mis-spelled character, as in order to accommodate 4 characters
> word, it requires a 25% error rate for 1 mis-spelled character. We probably
> will not accommodate for 3 characters word.
>
> I've gotten the information from here:
> http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Fuzzy%20Searches
> <
> http://mail.growhill.com/cgi-bin/webmanager/webmail.cgi?cmd=url&xdata=~2-dd4639fc876fef5244efd32efa438fb90296a3eadadba2c6d7ce00&url=http!3A!2F!2Flucene.apache.org!2Fcore!2F3_6_0!2Fqueryparsersyntax.html!23Fuzzy!2520Searches
> >
>
> Just to check, will this affect the performance of the system?
>
> Regards,
> Edwin
>
>
> On 7 May 2015 at 20:00, Alessandro Benedetti <benedetti.alex85@gmail.com>
> wrote:
>
> > Hi !
> > Currently Solr builds FST to provide proper fuzzy search or spellcheck
> > suggestions based on the string distance .
> > The current default algorithm is the Levenstein distance ( that returns
> the
> > number of edit as distance metric).
> > In your case you should calculate client side, the edit you want to apply
> > to your search.
> > In your client code, should be not difficult to process the query and
> apply
> > the proper number of edit depending on the length.
> >
> > Anyway the max edit for the levenstein default distance is fixed to 2 .
> >
> > Cheers
> >
> >
> >
> > 2015-05-05 10:24 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
> >
> > > Hi,
> > >
> > > Would like to check, how do we implement character proximity searching
> > > that's in terms of percentage with regards to the length of the word,
> > > instead of a fixed number of edit distance (characters)?
> > >
> > > For example, if we have a proximity of 20%, a word with 5 characters
> will
> > > have an edit distance of 1, and a word with 10 characters will
> > > automatically have an edit distance of 2.
> > >
> > > Will Solr be able to do that for us?
> > >
> > > Regards,
> > > Edwin
> > >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message