lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Proximity searching in percentage
Date Fri, 08 May 2015 09:14:45 GMT
Hi Alessandro,

I'm using Solr 5.0.0, but it is still able to work. Actually I found this
to be better than <query>~1 or <query>~2, as it can automatically detect
and allow the 20% error rate that I want.

For this <query>~1 or <query>~2, does it mean that I'll have to manually
detect how many characters did I enter, before I assign the suitable ~(tilde)
param in order to achieve the 20% error rate?
I'll probably need an edit distance of 0 for words with 3 or less
characters, 1 for words with 4 to 9 characters, edit distance of 2 for
words with 10 to 14 characters, and edit distance of 3 for words with more
than 15 characters.

Yes, for the performance I'm checking if the length check will affect the
query time. Thanks for your info on that. Currently my index is small, so
everything seems to run quite fast and the delay is un-noticeable. But not
so sure if it will slow down till it is noticeable by the user if I have
tens of collections with millions of records.


Regards,
Edwin



On 8 May 2015 at 16:53, Alessandro Benedetti <benedetti.alex85@gmail.com>
wrote:

> Hi Zheng,
> actually that version of the fuzzy search is deprecated!
> Currently the fuzzy search syntax is :
> <query>~1 or <query>~2
> The ~(tilde)  param is the number of edit we provide to generate all the
> expanded query to run.
> Can I ask you which version of Solr are you using ?
>
> This article from 2011 shows the biggest change in fuzzy query, and I guess
> it's still the current approach!
> Related the performance, what do you mean ?
> Are you worried if the length check will affect the query time ?
> The answer is yes, but the delay will be un-noticeable as you simply check
> the length and apply the proper fuzzy param related.
> Regarding the fact fuzzy query being slower than a normal query, that is
> true, but the FST approach guarantee really fast fuzzy query.
> So if you do need the fuzziness, it's something you can cope with.
>
> Cheers
>
> 2015-05-08 3:12 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
>
> > Thank you for the information.
> >
> > I've currently using the fuzzy search and set the edit distance value to
> > ~0.79, and this has allowed a 20% error rate. (ie for words with 5
> > characters, it allows 1 mis-spelled character, and for words with 10
> > characters, it allows 2 mis-speed characters).
> >
> > However, for words with 4 characters, I'll need to set the value to ~0.75
> > to allow 1 mis-spelled character, as in order to accommodate 4 characters
> > word, it requires a 25% error rate for 1 mis-spelled character. We
> probably
> > will not accommodate for 3 characters word.
> >
> > I've gotten the information from here:
> >
> http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Fuzzy%20Searches
>
> >
> > Just to check, will this affect the performance of the system?
> >
> > Regards,
> > Edwin
> >
> >
> > On 7 May 2015 at 20:00, Alessandro Benedetti <benedetti.alex85@gmail.com
> >
> > wrote:
> >
> > > Hi !
> > > Currently Solr builds FST to provide proper fuzzy search or spellcheck
> > > suggestions based on the string distance .
> > > The current default algorithm is the Levenstein distance ( that returns
> > the
> > > number of edit as distance metric).
> > > In your case you should calculate client side, the edit you want to
> apply
> > > to your search.
> > > In your client code, should be not difficult to process the query and
> > apply
> > > the proper number of edit depending on the length.
> > >
> > > Anyway the max edit for the levenstein default distance is fixed to 2 .
> > >
> > > Cheers
> > >
> > >
> > >
> > > 2015-05-05 10:24 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
> > >
> > > > Hi,
> > > >
> > > > Would like to check, how do we implement character proximity
> searching
> > > > that's in terms of percentage with regards to the length of the word,
> > > > instead of a fixed number of edit distance (characters)?
> > > >
> > > > For example, if we have a proximity of 20%, a word with 5 characters
> > will
> > > > have an edit distance of 1, and a word with 10 characters will
> > > > automatically have an edit distance of 2.
> > > >
> > > > Will Solr be able to do that for us?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message