lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sn00py <sn0...@ulysses-erp.com>
Subject AW: Re: update/extract override ExtractTyp
Date Thu, 05 Jan 2017 16:34:26 GMT

    
I am useing the Extract URL And Renamed the File to test.txtBut it is still Parsed with the
XML ParserCan I force the txt Parser for all .txt Files? 


Von meinem Samsung Gerät gesendet.

-------- Ursprüngliche Nachricht --------
Von: Shawn Heisey <apache@elyograg.org> 
Datum: 04.01.17  17:10  (GMT+01:00) 
An: solr-user@lucene.apache.org 
Betreff: Re: update/extract override ExtractTyp 

On 1/4/2017 8:12 AM, sn00py@ulysses-erp.com wrote:
> Is it possible to override the ExtractClass for a specific document?
> I would like to upload a XML Document, but this XML is not XML conform
>
> I need this XML because it is part of a project where a corrupt XML is
> need, for testing purpose.
>
>
> The update/extract process failes every time with an 500 error.
>
> I tried to override the Content-Type with "text/plain" but  get still
> the XML parse error.

If you send something to the /update handler, and don't tell Solr that
it is another format that it knows like CSV, JSON, or Javabin, then Solr
assumes that it is XML -- and that it is the *specific* XML format that
Solr uses.  "text/plain" is not one of the formats that the update
handler knows how to handle, so it will assume XML.

If you send some other arbitrary XML content, even if that XML is
otherwise correctly formed (which apparently yours isn't), Solr will
throw an error, because it is not the type of XML that Solr is looking
for.  On this page are some examples of what Solr is expecting when you
send XML:

https://wiki.apache.org/solr/UpdateXmlMessages

If you want to parse arbitrary XML into fields, you probably need to
send it using DIH and the XPathEntityProcessor.  If you want the XML to
go into a field completely as-is, then you need to encode the XML into
one of the update formats that Solr knows (XML, JSON, etc) and set it as
the value of one of the fields.

Thanks,
Shawn

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message