lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: AW: What is the best way to index xml data preserving the mark up?
Date Thu, 08 Nov 2007 18:19:40 GMT

: Thanks -- C-Data might be useful -- and I was looking into dynamic 
: fields as solution as well -- I think a combination of the two might 
: work.

I must admit i haven't been following this thread that closely, so i'm not 
sure how much of the "structure" of the XML you want to preserve for the 
purposes of querying, or if it's jsut an issue of wanting to store the raw 
XML, but on the the broader topic of indexing/searching arbitrary XML, i'd 
like to through out a few misc ideas i've had in the past that you might 
want to run with...

1) there's a Jira issue i pened a while back with a rough patch for 
applying a user specific XSLTs on the server to transforming arbitrary XML 
into the Solr XML update format (i don't have the issue number handy, and 
my browser is in the throws of death at the moment).  this might solve the 
"i want to send solr XML in my own schema, and i want to be able to tell 
it how to pull out various pieces to use as a field values.

2) I was once toying with the idea of an XPathTokenizer.  it would parse 
the fieldValues as XML, then apply arbitrary configured XPath expressions 
against the DOM and use the resulting NodeList to produce the TokenStream.

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around


View raw message