lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <>
Subject [jira] Updated: (LUCENE-474) High Frequency Terms/Phrases at the Index level
Date Mon, 28 Nov 2005 20:00:37 GMT
     [ ]

Mark Harwood updated LUCENE-474:


Here's some code that I've used before to find phrases in an index - see
If your index has termvector support enabled you can run it to mine the collocated terms.
This is typically a long operation that you dont want to do too often.
The CollocationIndexer can be used to store the mined collocations in an index.

Possible uses for collocations are:
* automatically identifying candidate terms in a query that can be turned into a phrase query
* better spelling correction by using all terms in query as context to pick the most likely
spelling variation 

Haven't done too much with this code but I've added it here because it sounds like it could
be relevant


> High Frequency Terms/Phrases at the Index level
> -----------------------------------------------
>          Key: LUCENE-474
>          URL:
>      Project: Lucene - Java
>         Type: New Feature
>     Versions: 1.4
>     Reporter: Suri Babu B
>  Attachments:
> We should be able to find the all the high frequncy terms/phrases ( where frequency 
is the search criteria / benchmark)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message