nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Tanaman" <alan.tana...@idna-solutions.com>
Subject RE: Creating Lucence Compound Index
Date Tue, 02 Jan 2007 14:12:53 GMT
Thanks for your feedback, we'll get to work on a patch in a day or two.
The config comment will be clear in stating the tradeoff.

Best regards,
Alan
_________________________
Alan Tanaman
iDNA Solutions

-----Original Message-----
From: Andrzej Bialecki [mailto:ab@getopt.org] 
Sent: 02 January 2007 14:06
To: nutch-dev@lucene.apache.org
Subject: Re: Creating Lucence Compound Index

Alan Tanaman wrote:
> Agree about the performance degradation (estimated at 5-10% by 
> Gospodnetic et Hatcher), which only affects the indexing time, not the 
> search time, but we would put this as a clear caveat in the conf file.
>   

Note: this is just for the time-related degradation. Temporary space usage
is 200% higher for compound indexes ...

> We'd rather the incremental index process be a little slower (our big 
> performance problem is on parsing anyway), but that the file system 
> work be a little more manageable.
>
> Are there any objections?
>   

I don't object to the idea of having this as an option, defaulting to
non-compound index, with a clear comment in the config file about this
tradeoff.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||  \|
||  |  Embedded Unix, System Integration http://www.sigram.com  Contact:
info at sigram dot com




Mime
View raw message