lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: StopWords coming in Top 10 terms despite using StopFilterFactory
Date Fri, 23 Sep 2011 13:23:31 GMT
On 9/23/2011 1:45 AM, Pranav Prakash wrote:
> Maybe I am wrong. But my intentions of using both of them is - first I 
> want to use phrase queries so used CommonGramsFilterFactory. Secondly, 
> I dont want those stopwords in my index, so I have used 
> StopFilterFactory to remove them. 

CommonGrams is not necessary for phrase queries.  If you have a 
super-dense index with very large documents, it will reduce the amount 
of memory used by Solr, which can make them faster.  It comes at a large 
expense in disk space because your index gets considerably larger.  The 
cost trade-off in index size vs. memory usage may not be worth it.  For 
an index like the Hathi Trust, the tradeoff is worthwhile.

> term frequencyto 26164and 25804the 25566of 25022a 24918in 24590for 23646n23588
> with 23055is 22510

Is this typical of your production index size, or just a test?  With 
numbers this low, neither commongrams nor stopfilter is really 
necessary.  I suspect that these are probably test numbers, though.

>
>>   Did you do delete and do a full reindex after you changed your schema?
>>
> Yup I did that a couple of times

I don't know what's going  on here, but it sounds like your config might 
not be saying what you think it's saying.  It might be a good idea to 
include your entire schema.xml and the name of the field that you are 
looking at for term frequency.

Thanks,
Shawn


Mime
View raw message