tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Kulakov (Jira)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2927) XSSFExcelExtractorDecorator emits non-existent empty rows.
Date Tue, 20 Aug 2019 13:35:00 GMT
Dmitry Kulakov created TIKA-2927:

             Summary: XSSFExcelExtractorDecorator emits non-existent empty rows.
                 Key: TIKA-2927
                 URL: https://issues.apache.org/jira/browse/TIKA-2927
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.22, 1.21, 1.20
            Reporter: Dmitry Kulakov

Parsing xlsx files with the _includeMissingRows_ set to true in the _OfficeParserConfig_ causes
the _XSSFExcelExtractorDecorator_ to emit extra empty rows equal to the current row number
- 1. The issue is that the _lastSeenRow_ is never updated, so every new row is treated as
the first non-empty row. Easy fix which requires the _lastSeenRow_ to be updated after the
start of every new row. I will add the fix along with the relevant unit test in a pull request.

This message was sent by Atlassian Jira

View raw message