lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Use of Lucene to store data from RSS feeds
Date Thu, 14 Oct 2010 14:17:43 GMT

I would like to store data retrieved hourly from RSS feeds in a database or in Lucene so that
the text can be easily
indexed for word frequencies.

I need to get the text from the title and description elements of RSS items.

Ideally, for each hourly retrieval from a given feed, I would add a row to a table in a dataset
made up of the
following columns:

feed_url, title_element_text, description_element_text, polling_date_time

>From this, I can look up any element in a feed and calculate keyword frequencies based
upon the length of time required.

This can be done as a database table and hashmaps used to calculate word frequencies. But
can I do this in Lucene to
this degree of granularity at all? If so, would each feed form a Lucene document or would
each 'row' from the
database table form one?

Can anyone advise?


Martin O'Shea.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message