lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: update/extract override ExtractTyp
Date Wed, 04 Jan 2017 16:10:19 GMT
On 1/4/2017 8:12 AM, sn00py@ulysses-erp.com wrote:
> Is it possible to override the ExtractClass for a specific document?
> I would like to upload a XML Document, but this XML is not XML conform
>
> I need this XML because it is part of a project where a corrupt XML is
> need, for testing purpose.
>
>
> The update/extract process failes every time with an 500 error.
>
> I tried to override the Content-Type with "text/plain" but  get still
> the XML parse error.

If you send something to the /update handler, and don't tell Solr that
it is another format that it knows like CSV, JSON, or Javabin, then Solr
assumes that it is XML -- and that it is the *specific* XML format that
Solr uses.  "text/plain" is not one of the formats that the update
handler knows how to handle, so it will assume XML.

If you send some other arbitrary XML content, even if that XML is
otherwise correctly formed (which apparently yours isn't), Solr will
throw an error, because it is not the type of XML that Solr is looking
for.  On this page are some examples of what Solr is expecting when you
send XML:

https://wiki.apache.org/solr/UpdateXmlMessages

If you want to parse arbitrary XML into fields, you probably need to
send it using DIH and the XPathEntityProcessor.  If you want the XML to
go into a field completely as-is, then you need to encode the XML into
one of the update formats that Solr knows (XML, JSON, etc) and set it as
the value of one of the fields.

Thanks,
Shawn


Mime
View raw message