poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject [Bug 55733] New: NullPointerException when attempting to parse a Word document with no headers
Date Fri, 01 Nov 2013 17:11:02 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=55733

            Bug ID: 55733
           Summary: NullPointerException when attempting to parse a Word
                    document with no headers
           Product: POI
           Version: 3.9
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XWPF
          Assignee: dev@poi.apache.org
          Reporter: david.patrone@jhuapl.edu

Created attachment 30990
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=30990&action=edit
Two Word test files without headers - one throws NullPointerException, one
doesn't

I was given a programmatically generated Word document that did not contain any
headers. MS Word is able to open this, however I get a NullPointerException
when attempting with XWPFWordExtractor.getText(). Specifically:

java.lang.NullPointerException
    at
org.apache.poi.xwpf.extractor.XWPFWordExtractor.extractHeaders(XWPFWordExtractor.java:162)
    at
org.apache.poi.xwpf.extractor.XWPFWordExtractor.getText(XWPFWordExtractor.java:87)
    at Test.testPrintDoc(Test.java:16)
    at Test.main(Test.java:26)

Looking at the code, it looks like hfPolicy is passed in as null to
XWPFWordExtractor.extractHeaders() from XWPFWordExtractor.getText():

public String getText() {
    StringBuffer text = new StringBuffer();
    XWPFHeaderFooterPolicy hfPolicy = document.getHeaderFooterPolicy();

    // Start out with all headers
    extractHeaders(text, hfPolicy);

which says the headerFooterPolicy of the Document (from
Document.getHeaderFooterPolicy()) is never set in Document, and is the source
of the null propagated to cause the error.

I'd chalk it up to an invalid Word document, however MS Word can open the file.
If you open it in Word, don't make any changes but just re-save it out, it
still reports it doesn't have headers, but the new file can be read by
XWPFWordExtractor.getText() without the NullPointerException.

Example word documents without a header that throw the error and don't throw it
are attached. Here's the test code I used to print out what was in the file.

import java.io.FileInputStream;

import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class Test {

    public static void testPrintDoc(String file) throws Exception {
        FileInputStream fis = new FileInputStream(file);
        System.err.println("Reading " + file);
        try {
            XWPFDocument doc = new XWPFDocument(fis);
            XWPFWordExtractor textExtractor = new XWPFWordExtractor(doc);
            System.err.println(textExtractor.getText());
        } finally {
            fis.close();
        }
    }    

    public static void main(String[] args) {

        try {
            Test.testPrintDoc("noHeaders.docx");
        } catch (Exception e) {
            e.printStackTrace();
        }
        try {
            Test.testPrintDoc("noHeaders_resaved.docx");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message