nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Maki" <crimesagainstlo...@gmail.com>
Subject Meta Tags and Indexing
Date Thu, 06 Sep 2007 14:45:14 GMT
Hello everyone,

I'm working on a project that is essentially a searchable database of
citations (academic ones). Nutch, naturally, was the searching tool I
decided to use because of it's full-featuredness. And cost.

Anyway, we had the requirement to be able to sort results by year (for
instance) and restrict results based on type (journal article, book,
etc.). I couldn't find a way to "label" the results (to use a Google
term), so I ended up writing a plugin to do so.

Based heavily on the example plugin for nutch, this plugin adds data
found in HTML page meta fields to the index, and allows one to
optionally use them in querying (or sorting). Not as full-featured as
google's labeling solution, perhaps, but it's a start.

Just thought I'd post it in case it saves anybody else some time...

Link: http://upclose.lrdc.pitt.edu/people/maki_assets/metatag.tar.gz

Thanks!

-Jeff

Mime
View raw message