tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "K, Baraneetharan" <baraneethara...@hp.com>
Subject partial file parsing
Date Tue, 05 Jun 2012 07:18:18 GMT
Hi Tika-dev community,

I'm new to Tika, We are using AutoDetectParser (from Tika 0.9)for parsing the files and sending
the parsed contents to Solr. We are facing severe performance issues while some large sized
.xlsx, .docx and .pptx files getting parsed. Hence it is decided to parse files partially
like first 10 paragraphs of a doc or first 1000 words or first 2MB of contents like that.

Please let me know is there any way to say Tika to parse part of a file.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message