tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Reuschling <reuschl...@dfki.uni-kl.de>
Subject possible bug: double invocation of handler.endDocument() in PDFParser for one file (new behaviour in Tika 1.5)
Date Mon, 16 Jun 2014 13:46:22 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I currently migrate to Tika 1.5, and fall into this behaviour, which leads to double entries
in my
database for one pdf file as I work directly with the handler.

Here are the two calls:

First call is in PDF2HTML, line 197: handler.endDocument();
this is part of the PDF2XHTML.process(pdfDocument, handler, context, metadata, localConfig);
invocation from PDFParser, line 143.


The second call is then directly in PDFParser, line 151: handler.endDocument();


Will stay at Tika 1.4 for now - still thanks for good work!


Christian


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlOe9S4ACgkQ6EqMXq+WZg/1dwCcD/OHrKb287FqLMw8T93ma+rk
Pn4An0WBWan0afV34aDbCWTtyJ5zlMw2
=Pzrf
-----END PGP SIGNATURE-----

Mime
View raw message