lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: WELCOME to
Date Tue, 08 Dec 2009 19:32:21 GMT

(FYI: in the future please start a new thread with an approriate subject 
line when you ask questions -- you probably would have gotten a lot more 
responses fro people interested in Tika and SolrCell if they could tell 
that this email was about SolrCell)

: I found that Tika read the html and extract metadata like <meta name="id"
: content="12"> from my htmls but my documents has an already an id setted by
: I tried to map the id from Tika by but it ignore also my

Hmmmm, yeah: that seems like  an odd order of operations, but it's 
documented on the wiki so evidently it's intentional...

my best sugguestions:

 * use the capture param to restrict what gets extracted (it's probably
possible to write an XPath query that selects everything *except* 
 * change the name of your uniqueKey field to be something other then "id" 
so it's less likely to collide with a value from the document.

I also opened two Jira issues that you may want to post comments in...


View raw message