tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slava G (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TIKA-2727) Parsing and detect mime type of XML file stuck in infinite loop
Date Mon, 17 Sep 2018 17:24:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617833#comment-16617833
] 

Slava G edited comment on TIKA-2727 at 9/17/18 5:23 PM:
--------------------------------------------------------

I'm using TIKA directly in my code,

Does sersion 1.19 solves this issue more delicate?

Also, we're using 1.17, because when we switched to 1.18 many customers data for parsing is
failed due to very strange error :    https://issues.apache.org/jira/browse/TIKA-2676

 I'm afraid that 1.19 will brings same issue back to us. 

 


was (Author: slavago):
I'm using TIKA directly in my code,

Does sersion 1.19 solves this issue more delicate?

Also, we're using 1.17, because when we switched to 1.18 many customers data for parsing is
failed due to very strange error, that was not discivered by our QA. So,. I'm afraid that
1.19 will brings same issue back to us. 

 

> Parsing and detect mime type of XML file stuck in infinite loop
> ---------------------------------------------------------------
>
>                 Key: TIKA-2727
>                 URL: https://issues.apache.org/jira/browse/TIKA-2727
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, parser
>    Affects Versions: 1.17
>            Reporter: Slava G
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.19, 2.0.0
>
>         Attachments: 1_e3e13f0e-7085-4000-a558-5d255ed7a944.xml
>
>
> Hi,
> I'm trying to parse (even mime type detect) some XML file that it's not large, but kinda
tricky and my process hangs on :
> XMLStringBuffer.append(char[], int, int) line: not available 
> XMLStringBuffer.append(XMLString) line: not available 
> XMLNSDocumentScannerImpl(XMLScanner).scanAttributeValue(XMLString, XMLString, String,
boolean, String) line: not available 
> XMLNSDocumentScannerImpl.scanAttribute(XMLAttributesImpl) line: not available 
> XMLNSDocumentScannerImpl.scanStartElement() line: not available 
> XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook() line: not available

> XMLNSDocumentScannerImpl$NSContentDispatcher(XMLDocumentFragmentScannerImpl$FragmentContentDispatcher).dispatch(boolean)
line: not available 
> XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) line:
not available 
> XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: not available

> XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) line: not
available 
> SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: not available 
> SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: not available

> SAXParserImpl$JAXPSAXParser.parse(InputSource) line: not available 
> SAXParserImpl.parse(InputSource, DefaultHandler) line: not available 
> SAXParserImpl(SAXParser).parse(InputStream, DefaultHandler) line: 195 
> XmlRootExtractor.extractRootElement(InputStream) line: 62 
> XmlRootExtractor.extractRootElement(byte[]) line: 42 
> MimeTypes.getMimeType(byte[]) line: 212 
> MimeTypes.detect(InputStream, Metadata) line: 494 
> DefaultDetector(CompositeDetector).detect(InputStream, Metadata) line: 84
>  
> Please see attached XML file.
> Please advise.
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message