lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: MS-Word docs.
Date Mon, 28 Nov 2005 01:28:09 GMT

: I dump the doc files into a text file with the same variable I use in
: the Lucene doc.add(Field.UnStored("content", textStr));| and they look
: fine in the file. However searches return nothing.

if i'm reading that sentence correctly, then you are saying that you've
tried isolating your MS-Word text extraction from your Lucene indexing and
confirmed that the MS-Word text extraction is working.

This is a very good first step.

Now if you want to be certain that your indexing is working correclty, i
suggest you try using a tool like Luke to check exactly what terms re
being stored in your index.  I'm guessing that the problems you are having
are a result of "analysis paralysis" as i've seen it called many times...

This wiki covers a lot of things you should check...


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message