lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject RE: Re : How does Lucene handle phrases containing words that are not indexed?
Date Thu, 14 Feb 2002 17:33:25 GMT
> From: Halácsy Péter []
> I'd like to index documents that are described by keywords. 
> One document can have zero or more keywords and a keyword can 
> be related to one ore more documents. Assume two keywords:
> "human computer interaction"
> "computer science"
> If I add these keywords to a documents in a field and one 
> search with query human science the document'll be found, 
> won't it? I could use - say - 16 distinct fields for the max 
> 16 keywords and translate the query keyword:"human science" 
> to keyword1:"human science" or keyword2:"human science" ... 
> keyword16:"human science" but this solution isn't prefered by me.

This sounds like a good case for an untokenized field.

When you index, use something like:

  Document doc = new Document();
  doc.add(Field.keyword("keyword", "computer science"));
  doc.add(Field.keyword("keyword", "human computer interaction"));

Then you can either add query keywords "manually":

  BooleanQuery query = (BooleanQuery)queryParser.parse("other terms",
  query.add(new TermQuery(new Term("keyword", "computer science")), true,

or you can integrate this with the query parser by making an analyzer that
constructs terms for the field named "keyword" using exactly the provided

  public class MyAnalyzer extends Analyzer {
    private Analyzer standard = new StandardAnalyzer();
    public TokenStream tokenStream(String field, final Reader reader) {
      if ("keyword".equals(field)) {
        return new CharTokenizer(reader) {
          protected boolean isTokenChar(char c) { return true; }
      } else {
        return standard.tokenStream(field, reader);

  Analyzer analyzer = new MyAnalyzer();
  Query query = queryParser.parse("keyword:\"computer science\"", analyzer);

I haven't tested the above code, but I hope you get the idea.


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message