lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eShard <zim...@yahoo.com>
Subject how to get solr (tika?) to capture more metadata from RSS feed?
Date Fri, 01 Mar 2013 15:35:34 GMT
Hi,
I have a lot of non standard IBM RSS feeds that needs to be crawled (via
ManifoldCF v1.1.1) and put into solr 4.0 final.
The problem is that we need to put the additional non standard metadata into
solr.
I've confirmed via fiddler that manifoldcf is indeed sending all the
appropriate metadata but something in solr is removing all of it. It's
either tika, rome or something else in solr.
see this link for more details  tika post
<http://lucene.472066.n3.nabble.com/how-to-add-more-metadata-to-tika-extraction-td4043417.html#a4043456>
 

So, is there a way to configure tika (or rome which handles RSS parsing) to
capture the additional metadata?
I read that the tika config file is deprecated or obsolete. Is that true?

Thanks,





--
View this message in context: http://lucene.472066.n3.nabble.com/how-to-get-solr-tika-to-capture-more-metadata-from-RSS-feed-tp4044015.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message