lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MOYSE Gilles (Cetelem)" <>
Subject Compound expression extraction
Date Tue, 21 Oct 2003 08:00:41 GMT

I'm trying to extract expressions from the terms position information, i.e.,
if two words appears frequently side-by-side, then we can consider that the
two words are only one. For instance, 'Object' and 'Oriented' appears
side-by-side 9 times out of 10. It allows us to define a new expression,
Does anyone knows the statistical method to detect such expressions ?


Gilles Moyse

-----Message d'origine-----
De : Eric Jain []
Envoyé : mardi 21 octobre 2003 09:24
À : Lucene Users List
Objet : Re: Lucene on Windows

> The CVS version of Lucene has a patch that allows one to use a
> 'Compound Index' instead of the traditional one.  This reduces the
> number of open files.  For more info, see/make the Javadocs for
> IndexWriter.

Interesting option. Do you have a rough idea of what the performance
impact of using this setting is?

Eric Jain

To unsubscribe, e-mail:
For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message