lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Smith <>
Subject RE: MoreLikeThis Interface changes
Date Mon, 26 Sep 2011 18:06:51 GMT
So, I thought you're response meant that I could eliminate my code:

        String[] fields = new String[1];
        fields[0] = "EVERYTHING";         // use the single "big" field in the index

But, if I comment out that code, my unit test fails.  If I include it, it passes.

I'm using MLT as follows:

            _query = new BooleanClause( InputStreamReader(is), "EVERYTHING"),

"is" is the input stream.  Did I miss something in your response?


-----Original Message-----
From: Robert Muir [] 
Sent: Wednesday, September 21, 2011 6:59 PM
Subject: Re: MoreLikeThis Interface changes

On Wed, Sep 21, 2011 at 5:17 PM, Scott Smith <> wrote:
> I'm updating my lucene code from 3.0 to 3.4.  There's a change in the MLT interface
I'm confused about.  I used the method.  It now appears I should change
to the, fieldname) method.  Easy enough to create an InputStreamReader
from an InputStream.

Yes, requiring a reader is to ensure that MLT is using the encoding you want

> So, my question is regarding the addition of the fieldname parameter.  There's also
a call called MLT.setFieldNames(String[]).  This would seem to be redundant except the setFieldNames()
allows you to specify multiple fields and like() doesn't.  Am I allowed to specify null as
the fieldname in like() (documentation doesn't say you can).  It seems like you shouldn't
need to do both.  But there's a difference in functionality between the two (since one allows
multiple fields and the other doesn't).

A Reader has no fields :)
The fieldName is only for passing to the Analyzer (@param fieldName
field passed to the analyzer to use when analyzing the content)
This is because some Analyzers (e.g. PerFieldAnalyzerWrapper) analyze
content differently according to different fields.

Previously, MoreLikeThis would use what was in the setFieldNames
parameter, iteratively like this:
for (field : fieldNames) {
  analyzer.analyze(field, reader);

However, MoreLikeThis also had a bug where it would never close() the
reader As you can see this logic was completely bogus, as you can only
consume the field once.

Effectively the reader would be analyzed by fieldNames[0], then MLT
would analyze an exhausted reader with fieldNames[1]...fieldNames[n].

When we fixed MLT to close its resources correctly (around 3.2), it
exposed this second bug, If you tried to pass a reader with multiple
values in fieldNames you would get an IOException because it tried to
re-consume a closed reader.

Now, instead when supplying a reader, you should pass in this
fieldName explicitly so that it analyzes the content the way you want.
For backwards compatibility with the deprecated method, it uses
fieldNames[0] only.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message