poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <nick.bu...@alfresco.com>
Subject Initial Word 6/95 support
Date Fri, 02 Jul 2010 21:04:35 GMT
Hi All

As you might've seen from my commits in the last few days, I've added some 
initial support to HWPF for word 6 and word 95 files. I've only been 
working with a view to doing text extraction (so I can ditch the text 
mining library from a work project). With lots of trial and error, some 
offset tips from WV's FIB parsing code, and some refactoring, we can now 
get text and paragraphs out of word 6 and word 95 files!

To play with this, you'll want HWPFOldDocument / Word6Extractor (catch 
OldWordFileFormatException and switch to the old one as needed)

I've got this working with various sample files producing by doing save-as 
from newer software. This means that it's not impossible that real Word 6 
/ Word 95 files will break it, especially if they're quick-saved (I didn't 
have any examples)

As usual, please upload files that don't work to new bugzilla entries, or 
even better upload the broken file and the patch that fixes it :)


To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org

View raw message