lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MitchK <mitc...@web.de>
Subject Re: Doing Shingle but also keep special single word
Date Sun, 22 Aug 2010 15:48:15 GMT

Hi,

keepword-filter is no solution for this problem, since this would lead to
the problematic that one has to manage a word-dictionary. As explained, this
would lead to too much effort.

You can easily add outputUnigrams=true and check out the analysis.jsp for
this field. So you can see how much bigger a single field will become with
this option.
However, I am quite sure that the difference between using
outputUnigrams=true and indexing in a seperate field is not noteworthy.

I would suggest you to do it the additionally-field-way, since this would
lead to more flexibility in boosting the different fields.

Unfortunately, I haven't understood your explanation about the use-case. But
it sounds a little bit like tagging?

Kind regards,
- Mitch


iorixxx wrote:
> 
>> Isn't set outputUnigrams="true" will
>> make index size about twice than when it's set to false?
> 
> Sure index will be bigger. I didn't know that this is problem for you. But
> if you have a list of special single words that you want to keep,
> keepwordfilter can eliminate other tokens. So index size will be okey.
> 
>> 
>> Scott
>> 
>> ----- Original Message ----- From: "Ahmet Arslan" <iorixxx@yahoo.com>
>> To: <solr-user@lucene.apache.org>
>> Sent: Saturday, August 21, 2010 1:15 AM
>> Subject: Re: Doing Shingle but also keep special single
>> word
>> 
>> 
>> >> I am building index with Shingle
>> >> filter. We know it's minimum 2-gram but I also
>> want keep
>> >> some special single word, e.g. IBM, Microsoft,
>> etc. i.e. I
>> >> want to do a minimum 2-gram but also want to have
>> these
>> >> single word in my index, Is it possible?
>> > 
>> > outputUnigrams="true" parameter does not work for
>> you?
>> > 
>> > After that you can cast <filter
>> class="solr.KeepWordFilterFactory" words="keepwords.txt"
>> ignoreCase="true"/> with keepwords.txt=IBM, Microsoft.
>> > 
>> > 
>> > 
>> > 
>> 
>> 
> 
> 
>       
> 
> 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1276506.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message