tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nutch.buddy@gmail.com" <nutch.bu...@gmail.com>
Subject Parsing large xlsx file takes much longer (and usually crashes) with tika than directly with POI
Date Wed, 11 Apr 2012 11:36:08 GMT
Hi
I'm trying to use tika-parsers to parse a 100mb xlsx file.
I find myself waiting a lot of time (maybe an hour or two) and rarely have
the file parsed.
usually i get a "gc overhead limit exceeded" exception.

When I parse the same file with a few lines of code using POI library, the
file is pared successfully, and relatively fast.

Any inputs on this?

I use tika-core-0.10 and tika-parsers-0.10 when I use tika and poi-3.8-beta3
when I use POI.



--
View this message in context: http://lucene.472066.n3.nabble.com/Parsing-large-xlsx-file-takes-much-longer-and-usually-crashes-with-tika-than-directly-with-POI-tp3902267p3902267.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Mime
View raw message