lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefanos Karasavvidis <ste...@msc.gr>
Subject Re: Problems with exact matces on non-tokenized fields...
Date Wed, 13 Nov 2002 19:44:42 GMT
I came accross the same problem and I think that the faq entry you 
(Otis) propose should get a better title so that users can find more 
easily an answer to this problem.

Correct me if I'm wrong (and please forgive any wrong assumptions I may 
have made), put the problem is on "how to query on a non tokenized field?"

Problem explanation:
If a field is not tokenized than it is not passed through the analyzer, 
independently of the used analyzer (that's what I understand by looking 
into DocumentWriter.invertDocument()).
If  you construct a query with a given analyzer  (for example with 
QueryParser.parse(query, field, analyzer))  with this field, the 
queryparser does not know that this field is not tokenized and passes it 
through the analyzer. Ther analyzer may alter the query (for example if 
the analyzer has a stemming algorithm) and the document is not matched 
uppon the query.

The solution:
The solution is to make sure that fields that aren't tokenized during 
indexig, are not passed through the analyzer during searching. This can 
be done in 2 ways, either by making an analyzer that takes care of this 
according to the field,  or by constructing a TermQuery with this field 
and adding it to the rest of the query

Example:
put here the 2 examples from Doug

Stefanos 



Otis Gospodnetic wrote:

>Thanks, it's a FAQ entry now:
>
>How do I write my own Analyzer?
>http://www.jguru.com/faq/view.jsp?EID=1006122
>
>Otis
>
>
>--- Doug Cutting <cutting@lucene.com> wrote:
>  
>
>>karl øie wrote:
>>    
>>
>>>I have a Lucene Document with a field named "element" which is
>>>      
>>>
>>stored 
>>    
>>
>>>and indexed but not tokenized. The value of the field is "POST" 
>>>(uppercase). But the only way i can match the field is by entering 
>>>"element:POST?" or "element:POST*" in the QueryParser class.
>>>      
>>>
>>There are two ways to do this.
>>
>>If this must be entered by users in the query string, then you need
>>to 
>>use a non-lowercasing analyzer for this field.  The way to do this if
>>
>>you're currently using StandardAnalyzer, is to do something like:
>>
>>   public class MyAnalyzer extends Analyzer {
>>     private Analyzer standard = new StandardAnalyzer();
>>     public TokenStream tokenStream(String field, final Reader
>>reader) {
>>       if ("element".equals(field)) {        // don't tokenize
>>         return new CharTokenizer(reader) {
>>           protected boolean isTokenChar(char c) { return true; }
>>         };
>>       } else {                              // use standard analyzer
>>         return standard.tokenStream(field, reader);
>>       }
>>     }
>>   }
>>
>>   Analyzer analyzer = new MyAnalyzer();
>>   Query query = queryParser.parse("... +element:POST", analyzer);
>>
>>Alternately, if this query field is added by a program, then this can
>>be 
>>done by bypassing the analyzer for this class, building this clause 
>>directly instead:
>>
>>   Analyzer analyzer = new StandardAnalyzer();
>>   BooleanQuery query = (BooleanQuery)queryParser.parse("...",
>>analyzer);
>>
>>   // now add the element clause
>>   query.add(new TermQuery(new Term("element", "POST"))), true,
>>false);
>>
>>Perhaps this should become an FAQ...
>>
>>Doug
>>
>>
>>--
>>To unsubscribe, e-mail:  
>><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>>For additional commands, e-mail:
>><mailto:lucene-user-help@jakarta.apache.org>
>>
>>    
>>
>
>
>__________________________________________________
>Do you Yahoo!?
>New DSL Internet Access from SBC & Yahoo!
>http://sbc.yahoo.com
>
>--
>To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>
>  
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message