lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Indexing documents with pre-calculated term frequencies
Date Wed, 11 Feb 2015 13:01:51 GMT
You could consider payloads but why do you want to do this?
What's the use case here? Sounds a little like an XY problem, you're
asking us how to do something without explaining the why; there
may be other ways to accomplish your task.

For instance, there's the "termfreq" function, which an be returned
as a field in the doc, see:


On Wed, Feb 11, 2015 at 4:54 AM, Stephen Fenech <> wrote:
> Hi,
> I would like to index documents which contain term frequencies instead of
> the actual text. For example, instead of getting "The big wolf ate the big
> sheep" I would get "the|2 big|2 wolf|1 ate|1 sheep|1". An easy way to index
> this would be to convert the frequencies back into text, so into something
> like "the the big big wolf ate sheep", but it does not look that elegant
> since I would be expanding the text, just to have Lucene "compress" it
> again.
> Any ideas? Or directions I should look into?
> I am considering:
> - Custom Analyzer (so I expand on while generating the TokenStream from the
> compressed text)
> Thanks in Advance,
> Stephen

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message