lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stan Lee <sleed...@gmail.com>
Subject Re: What's the best practices for indexing XML Content with dynamic XML Elements (SOLR 6.1) ?
Date Tue, 16 Aug 2016 19:05:57 GMT
Sorry for not being specific. I believe this SOLR plugin (LUX) may fit my
scenario (query without knowing the tag in advance).
http://luxdb.org/README.html

On Tue, Aug 16, 2016 at 12:18 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> You haven't really described the scenario you want
> to implement. I get that you have raw XML of an
> unknown structure. What do you want to _do_ with that?
>
> 1> if all you want to do is index the data (i.e. strip the tags)
> try HtmlStripCharFilterFactory.
> 2> If you want to intelligently take content of the XML
> and ingest it into specific Solr fields, I don't think you'll be
> able to do that without writing some specific code to
> parse the XML, explore it and "do the right thing" with it
> which will probably involve SolrJ, an XML parser and
> some programming.
>
> Best,
> Erick
>
> On Tue, Aug 16, 2016 at 6:15 AM, Stan Lee <sleedata@gmail.com> wrote:
> > We currently have a Microsoft SQL table with a XML datatype. We use DIH
> to
> > import the XML Content as is, that is not using the XPathEntityProcessor.
> > If the elements of the XML content is known, XPathEntity make sense.
> Could
> > someone kindly suggest the right way of handling such scenario, without
> > impacting search performance?
> > Which tokenizer should we be using?
> >
> >
> > Thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message