lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nawab Zada Asad Iqbal <khi...@gmail.com>
Subject Re: CommonGramsFilter
Date Sat, 06 Jan 2018 03:07:07 GMT
Actually, i have found that it is *not* mandatory to use phrase search with
CommonGramsFilter .

PS: i had some other code change (which is unnecessary) which was causing
the above behavior.

On Thu, Jan 4, 2018 at 6:56 PM, Nawab Zada Asad Iqbal <khichi@gmail.com>
wrote:

> After some debugging, it  seems that the search works if the query is
> phrase search (i.e, enclosed in quotes)
>
> http://localhost:8983/solr/filesearch/select?q=%22not%
> 20to%20or%20be%22&debugQuery=true
>
> This works both in case of sow=true or false.
>
> Is it mandatory to use phrase search to properly pass the stopwords to the
> CommonGramsFilter?
>
>
>
>
>
> On Thu, Jan 4, 2018 at 6:08 PM, Nawab Zada Asad Iqbal <khichi@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am looking at this documentation and wondering if it would be better to
>> optionally skip indexing of original stopwords.
>>
>> https://lucene.apache.org/solr/guide/6_6/filter-descriptions
>> .html#FilterDescriptions-CommonGramsFilter
>>
>> http://localhost:8983/solr/filesearch/select?q=not%20to%20or
>> %20be&debugQuery=true
>>
>>
>>    - parsedquery: "+(-DisjunctionMaxQuery((commongram_field2:to)~0.01)
>>    DisjunctionMaxQuery((commongram_field2:be)~0.01))~1",
>>
>>
>>
>> Other parameters are:
>>
>>
>>    - params: {
>>       - mm: " 1<-0% ",
>>       - q.alt: "*:*",
>>       - ps: "100",
>>       - echoParams: "all",
>>       - sort: "score desc",
>>       - rows: "35",
>>       - version: "2.2",
>>       - q: "not to or be",
>>       - tie: "0.01",
>>       - defType: "edismax",
>>       - qf: "commongram_field2",
>>       - sow: "false",
>>       - wt: "json",
>>       - debugQuery: "true"
>>       }
>>
>>
>> And it doesn't match my document, which has following fields:
>>
>>
>>    - id: "9191",
>>    - commongram_field2: "not to or be",
>>
>>
>>
>> Commongram is defined as:
>>
>>     <field name="commongram_field2" type="commongaram" indexed="true"
>> stored="true" omitPositions="false"/>
>>
>>     <fieldType name="commongaram" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer type="index">
>>         <charFilter class="org.apache.lucene.analy
>> sis.icu.ICUNormalizer2CharFilterFactory" name="nfkc" mode="compose"/>
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.WordDelimiterGraphFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" preserveOriginal="0"
>> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/>
>>         <filter class="solr.FlattenGraphFilterFactory"/>
>>         <filter class="solr.PatternReplaceFilterFactory"
>> pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>>         <filter class="solr.CommonGramsFilterFactory"
>> words="stopwords.txt" ignoreCase="true"/>
>>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>         <filter class="solr.LimitTokenCountFilterFactory"
>> maxTokenCount="10000" consumeAllTokens="false"/>
>>         <filter class="solr.LengthFilterFactory" min="1" max="255"/>
>>       </analyzer>
>>       <analyzer type="query">
>>         <charFilter class="org.apache.lucene.analy
>> sis.icu.ICUNormalizer2CharFilterFactory" name="nfkc" mode="compose"/>
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.WordDelimiterGraphFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" preserveOriginal="0"
>> splitOnCaseChange="1" splitOnNumerics="1" stemEnglishPossessive="0"/>
>>         <filter class="solr.PatternReplaceFilterFactory"
>> pattern="^(\p{Punct}*)(.*?)(\p{Punct}*)$" replacement="$2"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>>         <filter class="solr.CommonGramsFilterFactory"
>> words="stopwords.txt" ignoreCase="true"/>
>>         <filter class="solr.LengthFilterFactory" min="1" max="255"/>
>>       </analyzer>
>>     </fieldType>
>>
>>
>> I am not sure what I am missing. I have also set sow=false so that the
>> whole query string is sent to field's analysis chain instead of sending
>> word by word. But that didnt' seem to help.
>>
>> Thanks
>> Nawab
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message