lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clemens Wyss DEV <>
Subject [tika] ForkParser, Lost connection to a forked server process
Date Wed, 18 Feb 2015 07:32:05 GMT
Sorry for cross-posting, but the tika-ml does not seem to be  too "lively":
I am trying to make use of the ForkParser. Unfortunately I am getting „Lost connection to
a forked server process“  for an (encrypted) pdf which I can extract „in-process“. Extracting
the document "in-process" takes approx 40s (!). Also, extracting other (smaller) docs works
in/with the ForkParser. 

Memory should be no problem:
forkParser.setJavaCommand("java -Xmx2048m -Xdebug");

Running the unitTest with the forkparser the test stops after 10seconds. The console output
is alike:
SLF4J: Found binding in [tika-in-memory://localhost/3]
SLF4J: See for an explanation.
SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]
07:28:01.909 [main] INFO  o.apache.pdfbox.pdfparser.PDFParser - Document is encrypted
07:28:02.239 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{706, 0}
07:28:02.239 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{707, 0}
07:28:02.239 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{708, 0} ...
07:28:02.249 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{752, 0}
07:28:02.249 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{753, 0}
07:28:02.249 [main] DEBUG o.a.p.p.PDFObjectStreamParser - parsed=COSObject{754, 0}
07:28:11.465 [main] ERROR - failed to extract text from
input stream
org.apache.tika.exception.TikaException: Failed to communicate with a forked parser process.
The process has most likely crashed due to some error like running out of memory. A new process
will be started for the next parsing request.
	at org.apache.tika.fork.ForkParser.parse( ~[tika-core.jar:1.7]
	at [target/:na]
	at [target/:na]
	at [target/:na]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_25]
	at sun.reflect.NativeMethodAccessorImpl.invoke( ~[na:1.8.0_25]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
~[na:1.8.0_25] ...
	at [selenium-server-standalone.jar:na]
	at [.cp/:na]
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(
[.cp/:na] Caused by: Lost connection to a forked server process
	at org.apache.tika.fork.ForkClient.waitForResponse( ~[tika-core.jar:1.7]
	at ~[tika-core.jar:1.7]
	at org.apache.tika.fork.ForkParser.parse( ~[tika-core.jar:1.7]
	... 38 common frames omitted

Any timeouts I am running in? What else can I investigate on?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message