lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elisabeth benoit <elisaelisael...@gmail.com>
Subject Re: Ignore accent in a request
Date Fri, 08 Feb 2019 20:47:48 GMT
yes you do

and use the char filter at index and query time

Le ven. 8 févr. 2019 à 19:20, SAUNIER Maxence <MSAUNIER@q1c1.fr> a écrit :

> For the charFilter, I need to reindex all documents ?
>
> -----Message d'origine-----
> De : Erick Erickson <erickerickson@gmail.com>
> Envoyé : vendredi 8 février 2019 18:03
> À : solr-user <solr-user@lucene.apache.org>
> Objet : Re: Ignore accent in a request
>
> Elisabeth's suggestion is spot on for the accent.
>
> One other thing I noticed. You are using KeywordTokenizerFactory combined
> with EdgeNGramFilterFactory. This implies that you can't search for
> individual _words_, only prefix queries, i.e.
> je
> je s
> je su
> je sui
> je suis
>
> You can't search for "suis" for instance.
>
> basically this is an efficient way to search anything starting with
> three-or-more letter prefixes at the expense of index size. You might be
> better off just using wildcards (restrict to three letters at the prefix
> though).
>
> This is perfectly valid, I'm mostly asking if it's your intent.
>
> Best,
> Erick
>
> On Fri, Feb 8, 2019 at 9:35 AM SAUNIER Maxence <MSAUNIER@q1c1.fr> wrote:
> >
> > Thanks you !
> >
> > -----Message d'origine-----
> > De : elisabeth benoit <elisaelisaelisa@gmail.com> Envoyé : vendredi 8
> > février 2019 14:12 À : solr-user@lucene.apache.org Objet : Re: Ignore
> > accent in a request
> >
> > Hello,
> >
> > We use solr 7 and use
> >
> > <charFilter class="solr.MappingCharFilterFactory"
> > mapping="mapping-ISOLatin1Accent.txt"/>
> >
> > with mapping-ISOLatin1Accent.txt
> >
> > containing lines like
> >
> > # À => A
> > "\u00C0" => "A"
> >
> > # Á => A
> > "\u00C1" => "A"
> >
> > # Â => A
> > "\u00C2" => "A"
> >
> > # Ã => A
> > "\u00C3" => "A"
> >
> > # Ä => A
> > "\u00C4" => "A"
> >
> > # Å => A
> > "\u00C5" => "A"
> >
> > # Ā Ă Ą =>
> > "\u0100" => "A"
> > "\u0102" => "A"
> > "\u0104" => "A"
> >
> > # Æ => AE
> > "\u00C6" => "AE"
> >
> > # Ç => C
> > "\u00C7" => "C"
> >
> > # é => e
> > "\u00E9" => "e"
> >
> > Best regards,
> > Elisabeth
> >
> > Le ven. 8 févr. 2019 à 11:18, Gopesh Sharma <Gopesh_Sharma@gensler.com>
> a écrit :
> >
> > > We have fixed this type of issue by using Synonyms by adding
> > > SynonymFilterFactory(Before Solr 7).
> > >
> > > -----Original Message-----
> > > From: SAUNIER Maxence <MSAUNIER@q1c1.fr>
> > > Sent: Friday, February 8, 2019 3:36 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: Ignore accent in a request
> > >
> > > Hello,
> > >
> > > Thanks for you answer.
> > >
> > > I have test :
> > >
> > > select?defType=dismax&q=je suis avarié&qf=content
> > > 90.000 results
> > >
> > > select?defType=dismax&q=je suis avarie&qf=content
> > > 60.000 results
> > >
> > > With avarié, I dont find documents with avarie and with avarie, I
> > > don't find documents with avarié.
> > >
> > > I want to find they 150.000 documents with avarié or avarie.
> > >
> > > Thanks
> > >
> > > -----Message d'origine-----
> > > De : Erick Erickson <erickerickson@gmail.com> Envoyé : jeudi 7
> > > février
> > > 2019 19:37 À : solr-user <solr-user@lucene.apache.org> Objet : Re:
> > > Ignore accent in a request
> > >
> > > exactly _how_ is it "not working"?
> > >
> > > Try building your parameters _up_ rather than starting with a lot, e.g.
> > > select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you
> > > expect a match on title. Then:
> > > select?defType=dismax&q=je suis avarié&qf=title subject
> > >
> > > etc.
> > >
> > > Because mm=757 looks really wrong. From the docs:
> > > Defines the minimum number of clauses that must match, regardless of
> > > how many clauses there are in total.
> > >
> > > edismax is used much more than dismax as it's more flexible, but
> > > that's not germane here.
> > >
> > > finally, try adding &debug=query to the url to see exactly how the
> > > query is parsed.
> > >
> > > Best,
> > > Erick
> > >
> > > On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <MSAUNIER@q1c1.fr>
> wrote:
> > > >
> > > > Hello,
> > > >
> > > > How can I ignore accent in the query result ?
> > > >
> > > > Request :
> > > > http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&
> > > > qf
> > > > =t
> > > > itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
> > > >
> > > > I want to have doc with avarié and avarie.
> > > >
> > > > I have add this in my schema :
> > > >
> > > >   {
> > > >     "name": "string",
> > > >     "positionIncrementGap": "100",
> > > >     "analyzer": {
> > > >       "filters": [
> > > >         {
> > > >           "class": "solr.LowerCaseFilterFactory"
> > > >         },
> > > >         {
> > > >           "class": "solr.ASCIIFoldingFilterFactory"
> > > >         },
> > > >         {
> > > >           "class": "solr.EdgeNGramFilterFactory",
> > > >           "minGramSize": "3",
> > > >           "maxGramSize": "50"
> > > >         }
> > > >       ],
> > > >       "tokenizer": {
> > > >         "class": "solr.KeywordTokenizerFactory"
> > > >       }
> > > >     },
> > > >     "stored": true,
> > > >     "indexed": true,
> > > >     "sortMissingLast": true,
> > > >     "class": "solr.TextField"
> > > >   },
> > > >
> > > > But it not working.
> > > >
> > > > Thanks.
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message