tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Paulin <...@bobpaulin.com>
Subject Re: trunk build failing in bundle --, cxf class not found for GrobidRESTParser?
Date Wed, 02 Mar 2016 13:46:59 GMT
I saw it on the 2.x branch but now that you mention it's also happening 
in trunk I think I see the issue.  The change to the PDFParser includes 
adding dependencies in the javax.xml.stream package.  The tika-bundle 
currently has that package marked optional:

javax.xml.stream;version="[1.0,2)";resolution:=optional,

This means that the bundle will start without this class.  However now 
it's required for the PDFParser to work so my guess is that the 
PDFParser is not instantiating correctly and it's dropping into the 
JournalParser which is also coded to handle PDFs.  The JournalParser 
suffers a similar fate because org.apache.cxf.jaxrs.ext.multipart is 
optional on the GrobidRESTParser which gets instantiated in the parse 
method.

So I tried removing :
javax.xml.stream;version="[1.0,2)";resolution:=optional,
javax.xml.stream.events;version="[1.0,2)";resolution:=optional,
javax.xml.stream.util;version="[1.0,2)";resolution:=optional,
 From the tika-bundle/pom.xml and it worked!  So seeing that 
javax.xml.stream is provided by the JDK I'm a bit curious what those 
statements were doing there to begin with.  Anyone know?

- Bob

On 3/2/2016 6:26 AM, Allison, Timothy B. wrote:
> Anyone have an idea why trunk is now failing?  I couldn't find any changes between the
last successful build and last night's failures that would explain this.
>
>
> Test set: org.apache.tika.bundle.BundleIT
> -------------------------------------------------------------------------------
> Tests run: 9, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.997 sec <<<
FAILURE!
> testTikaBundle(org.apache.tika.bundle.BundleIT)  Time elapsed: 2.374 sec  <<<
ERROR!
> java.lang.ClassNotFoundException: org.apache.cxf.jaxrs.ext.multipart.ContentDisposition
not found by org.apache.tika.bundle [17]
> 	at org.apache.felix.framework.BundleWiringImpl.findClassOrResourceByDelegation(BundleWiringImpl.java:1558)
> 	at org.apache.felix.framework.BundleWiringImpl.access$400(BundleWiringImpl.java:79)
> 	at org.apache.felix.framework.BundleWiringImpl$BundleClassLoader.loadClass(BundleWiringImpl.java:1998)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 	at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:69)
> 	at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
>
> -----Original Message-----
> From: Hudson (JIRA) [mailto:jira@apache.org]
> Sent: Tuesday, March 01, 2016 9:59 PM
> To: dev@tika.apache.org
> Subject: [jira] [Commented] (TIKA-1857) Enhance PDFParser to extract text from XFA forms
>
>
>      [ https://issues.apache.org/jira/browse/TIKA-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174937#comment-15174937
]
>
> Hudson commented on TIKA-1857:
> ------------------------------
>
> UNSTABLE: Integrated in tika-trunk-jdk1.7 #916 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/916/])
> TIKA-1857: add basic XFA extraction support via Pascal Essiembre. (tallison: rev dbefe9830b26d05f9ce53503565a069bcc63d7c1)
> * tika-parsers/src/test/resources/test-documents/testPDF_XFA_govdocs1_258578.pdf
> * tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java
> * tika-parsers/src/main/resources/org/apache/tika/parser/pdf/PDFParser.properties
> * tika-parsers/src/main/java/org/apache/tika/parser/pdf/XFAExtractor.java
> TIKA-1857: add basic XFA extraction support via Pascal Essiembre. (tallison: rev 7c245fa87507cf0887838001c54c65b79b7e7cbc)
> * CHANGES.txt
>
>
>> Enhance PDFParser to extract text from XFA forms
>> ------------------------------------------------
>>
>>                  Key: TIKA-1857
>>                  URL: https://issues.apache.org/jira/browse/TIKA-1857
>>              Project: Tika
>>           Issue Type: Improvement
>>           Components: parser
>>             Reporter: Pascal Essiembre
>>               Labels: patch
>>              Fix For: 1.13
>>
>>          Attachments: 041617_filled_out.pdf, govdocs1_xfas.zip, xfa_in_govdocs1.txt
>>
>>
>> Extract text from PDF Forms (XFA).  Information about XFA: https://en.wikipedia.org/wiki/XFA
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)


Mime
View raw message