lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: TrecDocMaker
Date Mon, 28 Apr 2008 13:11:14 GMT
Yeah, these classes are a bit weird in that they are configured via  
properties, and not setters.  They really are designed to run inside  
the benchmaker and not much attention was paid to using them elsewhere.

However, one can co-opt them for the purposes you are doing:

Something like:
TrecDocMaker docMaker = new TrecDocMaker();
     Properties properties = new Properties();
     properties.setProperty("doc.maker.forever", "false");
     docMaker.setConfig(new Config(properties));

(not I was using the EnWikiDocMaker in the above example, but it  
should work for Trec, too.

I often also do something like:

while ((doc = docMaker.makeDocument()) != null && i < numDocs) {

where numDocs is the max. docs I want.


On Apr 27, 2008, at 2:31 PM, DanaWhite wrote:

> Greetings,
> I am trying to use TrecDocMaker so I can successfully index and  
> evaluate
> lucene on a TReC collection.
> It seems like I would just repeatedly call makeDocument() until all  
> the
> Documents have been created, but makeDocument appears to just read  
> forever.
> In general TrecDocMaker seems like an odd class and I just cant  
> figure out
> how to use it right.  I have been changing the class so it works  
> with an
> uncompressed collection and trying to modify it so makeDocument doesnt
> endlessly read, but no matter what I have done it is just causing a
> different error.  Clearly I am trying too hard.
> In short what I want know is how am I supposed to use TrecDocMaker  
> to parse
> my collection...cause the current Lucene implementation doesnt seem  
> to work
> right, or I am using it wrong.
> Thanks
> Dana
> -- 
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll

Lucene Helpful Hints:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message