lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Beginner Question: Tokenized and full phrase
Date Mon, 02 Sep 2019 13:28:58 GMT
In the Lucene context you simply have tokens. In the analyzed case (i.e. text), the token is
however the incoming stream is split up by the analysis chain you construct. In the string
case the token is the entire input. That’s just the way it works.

You have two choices:

1> Use two fields, one text-based and one string based. Your query puts the search text
against whichever one is appropriate. I’ll add that if you want to use limited analysis,
say lowercasing the entire input string, use a text-based field with something like KeywordTokenizer
+ LowerCaseFilter rather than a string field.

2> Use a text field and do phrase searching when you want the whole thing to match. The
flaw here is that if the text were “my dog has fleas” and you searched for “my dog”
(as a phrase), you’d get a match. You can get around that by adding another field with the
word count and then search something like “my dog” AND word_count:2.


> On Sep 2, 2019, at 4:38 AM, Roland Käser <> wrote:
> Hallo 
> We use Lucene to index POJO's which are stored in the database. 
> The index primarily contains text fields. 
> After some work with lucene I came across a strange restriction. 
> I can only assign string or text fields to the document to be indexed. 
> One only indexes the whole string, the other only the single words or tokens. 
> This results in the query finding only single words or the whole text, depending on the
field type used. 
> But we would need both, the search should find the whole text as well as single words.

> Even after a long analysis of the documentation and partly of the source code, 
> I'm not sure how to achieve that in a clean way. 
> Could someone give me a tip on how to do this? 
> Thanks 
> Roland
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message