lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SAUNIER Maxence <MSAUN...@q1c1.fr>
Subject RE: Ignore accent in a request
Date Fri, 08 Feb 2019 17:09:07 GMT
For the charFilter, I need to reindex all documents ? 

-----Message d'origine-----
De : Erick Erickson <erickerickson@gmail.com> 
Envoyé : vendredi 8 février 2019 18:03
À : solr-user <solr-user@lucene.apache.org>
Objet : Re: Ignore accent in a request

Elisabeth's suggestion is spot on for the accent.

One other thing I noticed. You are using KeywordTokenizerFactory combined with EdgeNGramFilterFactory.
This implies that you can't search for individual _words_, only prefix queries, i.e.
je
je s
je su
je sui
je suis

You can't search for "suis" for instance.

basically this is an efficient way to search anything starting with three-or-more letter prefixes
at the expense of index size. You might be better off just using wildcards (restrict to three
letters at the prefix though).

This is perfectly valid, I'm mostly asking if it's your intent.

Best,
Erick

On Fri, Feb 8, 2019 at 9:35 AM SAUNIER Maxence <MSAUNIER@q1c1.fr> wrote:
>
> Thanks you !
>
> -----Message d'origine-----
> De : elisabeth benoit <elisaelisaelisa@gmail.com> Envoyé : vendredi 8 
> février 2019 14:12 À : solr-user@lucene.apache.org Objet : Re: Ignore 
> accent in a request
>
> Hello,
>
> We use solr 7 and use
>
> <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>
> with mapping-ISOLatin1Accent.txt
>
> containing lines like
>
> # À => A
> "\u00C0" => "A"
>
> # Á => A
> "\u00C1" => "A"
>
> # Â => A
> "\u00C2" => "A"
>
> # Ã => A
> "\u00C3" => "A"
>
> # Ä => A
> "\u00C4" => "A"
>
> # Å => A
> "\u00C5" => "A"
>
> # Ā Ă Ą =>
> "\u0100" => "A"
> "\u0102" => "A"
> "\u0104" => "A"
>
> # Æ => AE
> "\u00C6" => "AE"
>
> # Ç => C
> "\u00C7" => "C"
>
> # é => e
> "\u00E9" => "e"
>
> Best regards,
> Elisabeth
>
> Le ven. 8 févr. 2019 à 11:18, Gopesh Sharma <Gopesh_Sharma@gensler.com> a écrit
:
>
> > We have fixed this type of issue by using Synonyms by adding 
> > SynonymFilterFactory(Before Solr 7).
> >
> > -----Original Message-----
> > From: SAUNIER Maxence <MSAUNIER@q1c1.fr>
> > Sent: Friday, February 8, 2019 3:36 PM
> > To: solr-user@lucene.apache.org
> > Subject: RE: Ignore accent in a request
> >
> > Hello,
> >
> > Thanks for you answer.
> >
> > I have test :
> >
> > select?defType=dismax&q=je suis avarié&qf=content
> > 90.000 results
> >
> > select?defType=dismax&q=je suis avarie&qf=content
> > 60.000 results
> >
> > With avarié, I dont find documents with avarie and with avarie, I 
> > don't find documents with avarié.
> >
> > I want to find they 150.000 documents with avarié or avarie.
> >
> > Thanks
> >
> > -----Message d'origine-----
> > De : Erick Erickson <erickerickson@gmail.com> Envoyé : jeudi 7 
> > février
> > 2019 19:37 À : solr-user <solr-user@lucene.apache.org> Objet : Re:
> > Ignore accent in a request
> >
> > exactly _how_ is it "not working"?
> >
> > Try building your parameters _up_ rather than starting with a lot, e.g.
> > select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you 
> > expect a match on title. Then:
> > select?defType=dismax&q=je suis avarié&qf=title subject
> >
> > etc.
> >
> > Because mm=757 looks really wrong. From the docs:
> > Defines the minimum number of clauses that must match, regardless of 
> > how many clauses there are in total.
> >
> > edismax is used much more than dismax as it's more flexible, but 
> > that's not germane here.
> >
> > finally, try adding &debug=query to the url to see exactly how the 
> > query is parsed.
> >
> > Best,
> > Erick
> >
> > On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <MSAUNIER@q1c1.fr> wrote:
> > >
> > > Hello,
> > >
> > > How can I ignore accent in the query result ?
> > >
> > > Request :
> > > http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&
> > > qf
> > > =t
> > > itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
> > >
> > > I want to have doc with avarié and avarie.
> > >
> > > I have add this in my schema :
> > >
> > >   {
> > >     "name": "string",
> > >     "positionIncrementGap": "100",
> > >     "analyzer": {
> > >       "filters": [
> > >         {
> > >           "class": "solr.LowerCaseFilterFactory"
> > >         },
> > >         {
> > >           "class": "solr.ASCIIFoldingFilterFactory"
> > >         },
> > >         {
> > >           "class": "solr.EdgeNGramFilterFactory",
> > >           "minGramSize": "3",
> > >           "maxGramSize": "50"
> > >         }
> > >       ],
> > >       "tokenizer": {
> > >         "class": "solr.KeywordTokenizerFactory"
> > >       }
> > >     },
> > >     "stored": true,
> > >     "indexed": true,
> > >     "sortMissingLast": true,
> > >     "class": "solr.TextField"
> > >   },
> > >
> > > But it not working.
> > >
> > > Thanks.
> >
Mime
View raw message