lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: AW: AW: Keeping capitalization in suggestions?
Date Tue, 09 Dec 2014 15:19:40 GMT
Clemens --

   what I do (see suggestions of titles of books on $EMPLOYER's web 
site) is to define a field with no analysis (type=keyword, use 
KeywordAnalyzer) and build the suggestions from that.  Then tell AIS to 
use an analyzer internally to pick out word from that (StandardAnalyzer, 
or WhitespaceAnalyzer, with LowerCaseFilter - however you want the 
matching to work in the suggester).  It will return the terms from the 
source field.

You didn't show the definition of your "suggest" field - I expect it 
must be analyzed, right?  Just don't do that.

-Mike

On 12/09/2014 08:58 AM, Clemens Wyss DEV wrote:
> Thanks for all the insightful links.
> I tried http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr but
that approach returns searchresults instead of term-suggestions.
>
> I have (at the moment) a solution based on http://wiki.apache.org/solr/TermsComponent
. But I might want multi-term-suggestions (and fuzzyness).
> Therefore I'd be very much interested how AnalyzingInfixLookupFactory (or any other suggest-component)
would allow to
> a) return case-sensitive suggestions (i.e. as-indexed/stored)
> b) allow case-insensitive suggestion-lookup
> ?
> Anybody else doing what I'd like to do?
>
> -----Ursprüngliche Nachricht-----
> Von: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID]
> Gesendet: Montag, 8. Dezember 2014 19:25
> An: solr-user@lucene.apache.org
> Betreff: Re: AW: Keeping capitalization in suggestions?
>
> Hi Clemens,
>
> There a a number of ways to implement auto complete/suggest. Some of them pull data from
indexed terms, therefore they will be lowercased. Some pull data from stored values, therefore
capitalisation is preserved.
>
> Here are great resources on this topic.
>
> https://lucidworks.com/blog/auto-suggest-from-popular-queries-using-edgengrams/
> http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
> http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/
>
> Ahmet
>
>
> On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV <clemensdev@mysign.ch> wrote:
>
> Allthough making use of AnalyzingInfixSuggester I still getting "either or".
>
> When lowercase-filter is active I always get suggestions, BUT they are lowercased (i.e.
"chamäleon").
> When lowercase-filter is not active I only get suggestions when querying "Chamä"
>
> my solrconfig.xml
> ...
>      <requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
>          <lst name="defaults">
>              <str name="echoParams">none</str>
>              <str name="wt">json</str>
>              <str name="indent">false</str>
>              <str name="spellcheck">true</str>
>              <str name="spellcheck.dictionary">suggestDictionary</str>
>              <str name="spellcheck.onlyMorePopular">true</str>
>              <str name="spellcheck.count">5</str>
>              <str name="spellcheck.collate">false</str>
>          </lst>
>          <arr name="components">
>              <str>suggest</str>
>          </arr>
>      </requestHandler>
> ...
>      <searchComponent class="solr.SpellCheckComponent" name="suggest">
>        <lst name="spellchecker">
>          <str name="name">suggestDictionary</str>
>          <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>          <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingInfixLookupFactory</str>
>          <str name="dictionaryImpl">org.apache.solr.spelling.suggest.DocumentDictionaryFactory</str>
>          <str name="field">suggest</str>
>          <str name="buildOnCommit">true</str>
>          <str name="storeDir">suggester</str>
>          <str name="suggestAnalyzerFieldType">text_suggest</str>
>          <str name="minPrefixChars">4</str>
>        </lst>
>      </searchComponent>
> ...
>
> my schema.xml
> ...
> <field indexed="true" multiValued="true" name="suggest" stored="false" type="text_suggest"/>
...
>      <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
> <!-- <filter class="solr.LowerCaseFilterFactory"/> -->
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
> <!--        <filter class="solr.LowerCaseFilterFactory"/>    -->
>    </analyzer>
>      </fieldType>
> ...
>
>
> -----Ursprüngliche Nachricht-----
> Von: Michael Sokolov [mailto:msokolov@safaribooksonline.com]
> Gesendet: Donnerstag, 4. Dezember 2014 14:05
> An: solr-user@lucene.apache.org
> Betreff: Re: Keeping capitalization in suggestions?
>
> Have a look at AnalyzingInfixSuggester - it does what you want.
>
> -Mike
>
> On 12/4/14 3:05 AM, Clemens Wyss DEV wrote:
>> When I index a text such as "Chamäleon" and look for suggestions for "chamä" and/or
"Chamä", I'd expect to get "Chamäleon" (uppercased).
>> But what happens is
>>
>> If lowecasefilter (see below (1)) set
>> "chamä" returns "chamäleon"
>> "Chamä" does not match
>>
>> If lowecasefilter (1) not set
>> "Chamä" returns "Chamäleon"
>> "chamä" does not match
>>
>> I guess lowecasefilter should not be set/active, but then how do I get matches even
if the search term is lowercased?
>>
>> Context:
>> schema.xml
>> ...
>>       <fieldType class="solr.TextField" name="text_de" positionIncrementGap="100">
>>         <analyzer type="index">
>>           <tokenizer class="solr.StandardTokenizerFactory"/>
>>           <filter class="solr.LowerCaseFilterFactory"/>
>>           <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>>           <filter class="solr.GermanLightStemFilterFactory"/>
>>         </analyzer>
>>         <analyzer type="query">
>>           <tokenizer class="solr.StandardTokenizerFactory"/>
>>           <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true"
synonyms="synonyms.txt"/>
>>           <filter class="solr.LowerCaseFilterFactory"/>
>>           <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt"/>
>>           <filter class="solr.GermanLightStemFilterFactory"/>
>>         </analyzer>
>>       </fieldType>
>> ...
>>       <fieldType class="solr.TextField" name="text_suggest" positionIncrementGap="100">
>>         <analyzer>
>>           <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/>
>>           <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>>           <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) -->
>>         </analyzer>
>>       </fieldType>
>>
>> solrconfig.xml
>> -----------------
>> ...
>>       <requestHandler class="org.apache.solr.handler.component.SearchHandler"
name="/suggest">
>>           <lst name="defaults">
>>               <str name="echoParams">none</str>
>>               <str name="wt">json</str>
>>               <str name="indent">false</str>
>>               <str name="spellcheck">true</str>
>>               <str name="spellcheck.dictionary">suggestDictionary</str>
>>               <str name="spellcheck.onlyMorePopular">true</str>
>>               <str name="spellcheck.count">5</str>
>>               <str name="spellcheck.collate">false</str>
>>           </lst>
>>           <arr name="components">
>>               <str>suggest</str>
>>           </arr>
>>       </requestHandler>
>> ...
>>       <searchComponent class="solr.SpellCheckComponent" name="suggest">
>>           <lst name="spellchecker">
>>               <str name="name">suggestDictionary</str>
>>               <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>>               <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookupFactory</str>
>>               <str name="field">suggest</str>
>>               <float name="threshold">0.</float>
>>               <str name="buildOnCommit">true</str>
>>           </lst>
>>       </searchComponent>
>> ...
>>


Mime
View raw message