tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tadeu Alves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1457) NullPointerException in tika-app, parsing PDF content
Date Mon, 27 Oct 2014 20:11:34 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185726#comment-14185726
] 

Tadeu Alves commented on TIKA-1457:
-----------------------------------

Thanks again Tim for your help,

This an homologation environment, and that's why i'm testing like this

i want to see if this will fix all of my indexing problems till solr 5.0 comes out. I'm monitoring
my Solr server to see if it will have memory leaks or CPU stress

But nothing wrong at the momment, tomorrow i'll post the final result.

> NullPointerException in tika-app, parsing PDF content
> -----------------------------------------------------
>
>                 Key: TIKA-1457
>                 URL: https://issues.apache.org/jira/browse/TIKA-1457
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.5
>         Environment: OS - Linux Centos 6.5
> Web APP - Tomcat6
> Using Solr 4.10
> Tika Jar
>           * tika-core-1.5.jar
>           * tika-parsers-1.5.jar
>           * tika-xmp-1.5.jar
>           * pdfbox-1.8.4.jar
>            Reporter: Tadeu Alves
>              Labels: bug, parser, solr, tika,text-extraction
>             Fix For: 1.6
>
>
> When I try to extract text from some pdf files with the tika app 1.5
> null:org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.pdf.PDFParser@52cfcf01
> 	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:225)
> 	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> 	at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:246)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> 	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> 	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> 	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> 	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> 	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> 	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> 	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> 	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> 	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
> 	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> 	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.PDFParser@52cfcf01
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> 	... 19 more
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 0
> 	at java.lang.String.charAt(String.java:658)
> 	at org.apache.pdfbox.util.DateConverter.parseDate(DateConverter.java:680)
> 	at org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:808)
> 	at org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:780)
> 	at org.apache.pdfbox.util.DateConverter.toCalendar(DateConverter.java:754)
> 	at org.apache.pdfbox.cos.COSDictionary.getDate(COSDictionary.java:797)
> 	at org.apache.pdfbox.pdmodel.PDDocumentInformation.getModificationDate(PDDocumentInformation.java:232)
> 	at org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:176)
> 	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:142)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	... 22 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message