lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Proximity searching in percentage
Date Fri, 08 May 2015 02:12:07 GMT
Thank you for the information.

I've currently using the fuzzy search and set the edit distance value to
~0.79, and this has allowed a 20% error rate. (ie for words with 5
characters, it allows 1 mis-spelled character, and for words with 10
characters, it allows 2 mis-speed characters).

However, for words with 4 characters, I'll need to set the value to ~0.75
to allow 1 mis-spelled character, as in order to accommodate 4 characters
word, it requires a 25% error rate for 1 mis-spelled character. We probably
will not accommodate for 3 characters word.

I've gotten the information from here:
http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Fuzzy%20Searches
<http://mail.growhill.com/cgi-bin/webmanager/webmail.cgi?cmd=url&xdata=~2-dd4639fc876fef5244efd32efa438fb90296a3eadadba2c6d7ce00&url=http!3A!2F!2Flucene.apache.org!2Fcore!2F3_6_0!2Fqueryparsersyntax.html!23Fuzzy!2520Searches>

Just to check, will this affect the performance of the system?

Regards,
Edwin


On 7 May 2015 at 20:00, Alessandro Benedetti <benedetti.alex85@gmail.com>
wrote:

> Hi !
> Currently Solr builds FST to provide proper fuzzy search or spellcheck
> suggestions based on the string distance .
> The current default algorithm is the Levenstein distance ( that returns the
> number of edit as distance metric).
> In your case you should calculate client side, the edit you want to apply
> to your search.
> In your client code, should be not difficult to process the query and apply
> the proper number of edit depending on the length.
>
> Anyway the max edit for the levenstein default distance is fixed to 2 .
>
> Cheers
>
>
>
> 2015-05-05 10:24 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
>
> > Hi,
> >
> > Would like to check, how do we implement character proximity searching
> > that's in terms of percentage with regards to the length of the word,
> > instead of a fixed number of edit distance (characters)?
> >
> > For example, if we have a proximity of 20%, a word with 5 characters will
> > have an edit distance of 1, and a word with 10 characters will
> > automatically have an edit distance of 2.
> >
> > Will Solr be able to do that for us?
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message