lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scott chu (朱炎詹) <scott....@udngroup.com>
Subject Re: Doing Shingle but also keep special single word
Date Mon, 23 Aug 2010 02:23:09 GMT
I think I didn't state my problem very well, allow me rephrase my case here:

1. We have over ten million news articles to build into Solr index.
2. We copy several fields, such as title, author, body, caption of attahed 
photos into a new field for default search.
3. We then wanna use shingle filter on this new field.
4. We can't predict what new single-word noun that our users may be 
interesting cause it's "news", you know. For exmple, the word "ECFA" is only 
very popular word in news here recently, so I wish users can type in 'ECFA' 
to search and Solr will output see some relevant news articles.
5. I wish to keep index as smaller as possible.
6. I also wish to do same thing descirbed in 5 when I search by explicitly 
specifyng field name of those fields, too.

I don't quite understand additional-field-way? Do you mean making another 
field that stores special words particularly but no indexing for that field?

Scott

----- Original Message ----- 
From: "MitchK" <mitch91@web.de>
To: <solr-user@lucene.apache.org>
Sent: Sunday, August 22, 2010 11:48 PM
Subject: Re: Doing Shingle but also keep special single word


>
> Hi,
>
> keepword-filter is no solution for this problem, since this would lead to
> the problematic that one has to manage a word-dictionary. As explained, 
> this
> would lead to too much effort.
>
> You can easily add outputUnigrams=true and check out the analysis.jsp for
> this field. So you can see how much bigger a single field will become with
> this option.
> However, I am quite sure that the difference between using
> outputUnigrams=true and indexing in a seperate field is not noteworthy.
>
> I would suggest you to do it the additionally-field-way, since this would
> lead to more flexibility in boosting the different fields.
>
> Unfortunately, I haven't understood your explanation about the use-case. 
> But
> it sounds a little bit like tagging?
>
> Kind regards,
> - Mitch
>
>
> iorixxx wrote:
>>
>>> Isn't set outputUnigrams="true" will
>>> make index size about twice than when it's set to false?
>>
>> Sure index will be bigger. I didn't know that this is problem for you. 
>> But
>> if you have a list of special single words that you want to keep,
>> keepwordfilter can eliminate other tokens. So index size will be okey.
>>
>>>
>>> Scott
>>>
>>> ----- Original Message ----- From: "Ahmet Arslan" <iorixxx@yahoo.com>
>>> To: <solr-user@lucene.apache.org>
>>> Sent: Saturday, August 21, 2010 1:15 AM
>>> Subject: Re: Doing Shingle but also keep special single
>>> word
>>>
>>>
>>> >> I am building index with Shingle
>>> >> filter. We know it's minimum 2-gram but I also
>>> want keep
>>> >> some special single word, e.g. IBM, Microsoft,
>>> etc. i.e. I
>>> >> want to do a minimum 2-gram but also want to have
>>> these
>>> >> single word in my index, Is it possible?
>>> >
>>> > outputUnigrams="true" parameter does not work for
>>> you?
>>> >
>>> > After that you can cast <filter
>>> class="solr.KeepWordFilterFactory" words="keepwords.txt"
>>> ignoreCase="true"/> with keepwords.txt=IBM, Microsoft.
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>>
>>
>>
>>
> -- 
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1276506.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


--------------------------------------------------------------------------------



¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3083 - Release Date: 08/20/10 
14:35:00


Mime
View raw message