lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Synonyms list breaks solr
Date Fri, 11 Jul 2008 12:21:21 GMT
Are there any errors in your logs?  Have you tried looking at the  
admin analysis page to see how text gets treated on that field?

Are you sure the large synonym file is formatted correctly?

-Grant

On Jul 11, 2008, at 7:23 AM, matt connolly wrote:

>
> I'm setting up Solr to run on a web site I'm working on.
>
> Basically, if I use no synonym file, then Solr is working really  
> well for
> finding text, the porter stemmer filter is great.
>
> It also works with a small synonym file, like the one in the  
> example, which
> defines Television,TV.
>
> But when I add a large synonym file (like approx 7000 synonyms), then
> everything breaks down. Even queries for exact words don't return any
> results.
>
> Could it be that there is something in the synonym file (non-ascii  
> char for
> example) that is causing the synonym filter to do something wierd,  
> like not
> pass any tokens?
>
> Could it be that the synonym filter is now expanding practically  
> everything
> so that no document is considered relevant enough? (I tried making the
> defaultOperator="OR" no difference.)
>
>
> My text field is defined in the schema as:
>
>    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory"  
> synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
>
> Thanks for any help,
> Matt
>
>
> -- 
> View this message in context: http://www.nabble.com/Synonyms-list-breaks-solr-tp18401710p18401710.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








Mime
View raw message