lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Indexing documents with pre-calculated term frequencies
Date Wed, 11 Feb 2015 13:01:51 GMT
You could consider payloads but why do you want to do this?
What's the use case here? Sounds a little like an XY problem, you're
asking us how to do something without explaining the why; there
may be other ways to accomplish your task.

For instance, there's the "termfreq" function, which an be returned
as a field in the doc, see:
https://cwiki.apache.org/confluence/display/solr/Function+Queries

Best,
Erick

On Wed, Feb 11, 2015 at 4:54 AM, Stephen Fenech <luvscout@gmail.com> wrote:
> Hi,
>
> I would like to index documents which contain term frequencies instead of
> the actual text. For example, instead of getting "The big wolf ate the big
> sheep" I would get "the|2 big|2 wolf|1 ate|1 sheep|1". An easy way to index
> this would be to convert the frequencies back into text, so into something
> like "the the big big wolf ate sheep", but it does not look that elegant
> since I would be expanding the text, just to have Lucene "compress" it
> again.
>
> Any ideas? Or directions I should look into?
>
> I am considering:
> - Custom Analyzer (so I expand on while generating the TokenStream from the
> compressed text)
>
> Thanks in Advance,
>
> Stephen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message