tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2479) Handle empty cells in tables uniformly
Date Fri, 18 May 2018 16:12:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16480852#comment-16480852
] 

Hudson commented on TIKA-2479:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1488 (See [https://builds.apache.org/job/Tika-trunk/1488/])
TIKA-2479 Option to request missing rows where possible in Excel-like (nick: [https://github.com/apache/tika/commit/a1e42a0659ba33e90cb1bba0a0a10eeb97d4fac7])
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java
TIKA-2479 Output missing left/mid cells in XLSX and XLSB, and optionally (nick: [https://github.com/apache/tika/commit/b1b035e6bbcff0db24e133b682ac79916f92f599])
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/TabularFormatsTest.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFBExcelExtractorDecorator.java
TIKA-2479 Update XLS missing cell/row handling to match XLSX and XLSB, (nick: [https://github.com/apache/tika/commit/348b87e7f41b79ff115e17d9c91d2dad63a57c15])
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ExcelExtractor.java
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/TabularFormatsTest.java


> Handle empty cells in tables uniformly
> --------------------------------------
>
>                 Key: TIKA-2479
>                 URL: https://issues.apache.org/jira/browse/TIKA-2479
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 2.0, 1.19
>
>         Attachments: patch.diff
>
>
> It looks like we output a <td/> for empty cells in xls, and tables in doc, docx
and pptx.  However, we don't retain empty cells in xlsx or tables in ppt.  We should make
this handling uniform.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message