lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: WELCOME to solr-user@lucene.apache.org
Date Tue, 08 Dec 2009 19:32:21 GMT

(FYI: in the future please start a new thread with an approriate subject 
line when you ask questions -- you probably would have gotten a lot more 
responses fro people interested in Tika and SolrCell if they could tell 
that this email was about SolrCell)

: I found that Tika read the html and extract metadata like <meta name="id"
: content="12"> from my htmls but my documents has an already an id setted by
: literal.id=10.
: 
: I tried to map the id from Tika by fmap.id=ignored_ but it ignore also my
: literal.id

Hmmmm, yeah: that seems like  an odd order of operations, but it's 
documented on the wiki so evidently it's intentional...

http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations

my best sugguestions:

 * use the capture param to restrict what gets extracted (it's probably
possible to write an XPath query that selects everything *except* 
metadata[id])
 * change the name of your uniqueKey field to be something other then "id" 
so it's less likely to collide with a value from the document.

I also opened two Jira issues that you may want to post comments in...

https://issues.apache.org/jira/browse/SOLR-1633
https://issues.apache.org/jira/browse/SOLR-1634


-Hoss


Mime
View raw message