lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fehling <bernd.fehl...@uni-bielefeld.de>
Subject skipping parts of query analysis for some queries
Date Fri, 30 Sep 2011 12:41:53 GMT
I'm in the need of skipping some query analysis steps for some
queries. Or more precisely, make it switchable with a query
parameter.

Use case:
<fieldType name="text_spec" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
       <analyzer type="index">
         <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/>
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" 
catenateAll="0" splitOnCaseChange="0" splitOnNumerics="1"/>
         <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-FoldToASCII.txt"/>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/>
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" 
catenateAll="0" splitOnCaseChange="0" splitOnNumerics="1"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.ShingleFilterFactory" maxShingleSize="3" outputUnigrams="false"
outputUnigramsIfNoShingles="true"/>
         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" format="solr"
                                                   tokenizerFactory="solr.KeywordTokenizerFactory"
ignoreCase="true" expand="true"/>
       </analyzer>
</fieldType>

For some queries I want to skip SynonymFilterFactory with or without ShingleFilterFactory.
First I thought of a second field with a seperate fieldType, but why stuffing content twice
in the index?
So I had the idea to make things switchable with query parameter.
E.g. for SynonymFilterFactory class there will we two optional attributes,
querycontrol=true/false (default=false)
queryparam=sff.         (default=sff)

With query ...&sff=true&... it will use SynonymFilterFactory
with query ...&sff=false&... it will do nothing in SynonymFilterFactory.

Easy to implement but this is only for SynonymFilterFactory.
What if I want to swith of other filters with my query?
Should I patch all FilterFactories?

Next idea. How about to modify the analyzer?
<analyzer type="query">
   <charFilter...
   <tokenizer...
   <filter...
   <optional switch="foo">
     <filter...
     <filter...
   </optional>
</analyzer>

Now with query ...&foo=true&... it will use the filters enclosed by the optional tag,
with query ...&foo=false&... they are skipped.

Advantage:
- more flexibility
- no need to index content twice or more times if only changes in query analysis
   makes the difference


Any opinions?

Regards,
Bernd




Mime
View raw message