lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Advice on Stemming in Solr
Date Fri, 03 Nov 2017 03:25:15 GMT
Hi Emir,

We are looking to change to HunspellStemFilterFactory. This has a
dictionary file containing words and applicable flags, and an affix file
that specifies how these flags will control spell checking.
Probably we can control it from those files in HunspellStemFilterFactory?

Regards,
Edwin


On 2 November 2017 at 17:46, Emir Arnautović <emir.arnautovic@sematext.com>
wrote:

> Hi Edwin,
> It seems that it would be best if you do not apply *ing stemming rule at
> all. The first idea is to trick stemmer and replace any word that ends with
> ing to some nonexisting char combination e.g. ‘wqx’. You can use solr.PatternReplaceFilterFactory
> to do that. You can switch it back after stemming if want to have proper
> token in index.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> wrote:
> >
> > Hi Emir,
> >
> > We do have quite alot of words that should not be stemmed. Currently, the
> > KStemFilterFactory are stemming all the non-English words that end with
> > "ing" as well. There are quite alot of places and names which ends in
> > "ing", and all these are being stemmed as well, which leads to an
> > inaccurate search.
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 November 2017 at 18:20, Emir Arnautović <
> emir.arnautovic@sematext.com>
> > wrote:
> >
> >> Hi Edwin,
> >> If the number of words that should not be stemmed is not high you could
> >> use KeywordMarkerFilterFactory to flag those words as keywords and it
> >> should prevent stemmer from changing them.
> >> Depending on what you want to achieve, you might not be able to avoid
> >> using stemmer at indexing time. If you want to find documents that
> contain
> >> only “walking” with search term “walk”, then you have to stem at index
> >> time. Cases when you use stemming on query time only are rare and
> specific.
> >> If you want to prefer exact matches over stemmed matches, you have to
> >> index same content with and without stemming and boost matches on field
> >> without stemming.
> >>
> >> HTH,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We are currently using KStemFilterFactory in Solr, but we found that it
> >> is
> >>> actually doing stemming on non-English words like "ximenting", which it
> >>> stem to "ximent". This is not what we wanted.
> >>>
> >>> Another option is to use the HunspellStemFilterFactory, but there are
> >> some
> >>> English words like "running", walking" that are not being stemmed.
> >>>
> >>> Would like to check, is it advisable to use Stemming at index? Or we
> >> should
> >>> not use Stemming at index time, but at query time, do a search for the
> >>> stemmed words as well, like for example, if the user search for
> >> "walking",
> >>> we will do the search together with "walk", and the actual word of
> >> walking
> >>> will have higher weightage.
> >>>
> >>> I'm currently using Solr 6.5.1.
> >>>
> >>> Regards,
> >>> Edwin
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message