lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: indexing and searching the document title question
Date Tue, 27 Feb 2007 18:13:45 GMT
You've probably got it right. But I'd add a couple of things....

1> by using the correct analyzer at index and query time, the
casing will be taken care of for you.

2> you don't want UN_TOKENIZED for fields you search on
in general because there's no parsing. So if you indexed
"This is a String", searching on "This" or "this" wouldn't match.

3> In your code fragment, you didn't show what Analyzer you
use. This is way more important than you think.

4> get a copy of Luke (google lucene luke). It'll let you examine
your index and save you a world of hurt. There have been some
very nice improvements lately along with 2.1 compatability.

5> If you want searches and indexing to use different analyzers
on different fields, see PerFieldAnalyzerWrapper.

6> You'll probably find yourself storing the same data multiple
times, once for searching and once for displaying. So you'll search
on the lowercased, indexed field and display the UN_TOKENIZED
version since it'll retain the capitalization.

7> I think your underlying problem is that the syntax of the search
isn't correct. You're really searching on
NAME:color
defaultfield:me
defaultfield:mine

You want something like +NAME:color +NAME:me +NAME:mine

Best
Erick

On 2/27/07, Phillip Rhodes <spamsucks@rhoderunner.com> wrote:
>
> Hi,
> According to the FAQ, by indexing the title of the document and performing
> a search against the shorter field will automatically give it a higher
> weight than matches against the document content.  That is what I am trying
> to accomplish with a "NAME" field.  If someone enters a close match of the
> name of a document (example Names: "Color Me Mine" ,"Pittsburgh and Its
> Countryside"), I want that document to get a hit.  The search is user
> entered, so I want it to be case-insensitive.  I also don't want it to have
> to be an exact match.  Search terms such as "Pittsburgh Countryside" should
> match up against a name of "Pittsburgh and Its Countryside".
>
>
> Here I am adding the name field to my document:
> String value= "Color Me Mine";
> document.add(new Field("NAME", value, Field.Store.YES,
>                                 Field.Index.TOKENIZED));
>
> Performing a search:
> NAME:color me mine ->returns no results
> NAME:color -> returns the document
>
> I tried indexing the document without the value tokenized:
> document.add(new Field("NAME", value, Field.Store.YES,
>                                 Field.Index.UN_TOKENIZED));
>
> This caused the search to be case sensitive.
>
> I am about to modify my indexing/searching code to use a secondary field,
> "name_lowercase", this field would of course contain the name of the object
> in lowercase and I would lowercase my search terms in I construct my
> TermQuery for this field.
>
> Is this a valid approach, or am I missing something?
>
> Thanks!
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message