nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddhartha Reddy <s...@grok.in>
Subject Nutch/Solr: storing the page cache in Solr
Date Wed, 13 May 2009 13:36:56 GMT
I'm trying to patch Nutch to allow the page cache to be added to the Solr
index when using the SolrIndexer tool. Is there any reason this is not done
by default? The Solr schema even has the "cache" field but it is left empty.

I'm enclosing a patch of the changes I have made. I have done some testing
and this seems to work fine. Can someone please take a look at it let me
know if I'm doing anything wrong? I'm especially not sure about the
character encoding to assume when converting the Content (which is stored as
byte[]) to a String; I'm getting the encoding from Metadata (using the key
Metadata.ORIGINAL_CHAR_ENCODING) but it is always null.

Thanks,
Siddhartha

Mime
View raw message