tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2641) Unit test for consistency between tabular/columnar formats
Date Thu, 03 May 2018 21:57:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16463118#comment-16463118
] 

Hudson commented on TIKA-2641:
------------------------------

UNSTABLE: Integrated in Jenkins build Tika-trunk #1481 (See [https://builds.apache.org/job/Tika-trunk/1481/])
Stub a unit test for TIKA-2641 (nick: [https://github.com/apache/tika/commit/d4719f63ffb381dbbfc53e667379389cb26593c1])
* (add) tika-parsers/src/test/java/org/apache/tika/parser/TabularFormatsTest.java


> Unit test for consistency between tabular/columnar formats
> ----------------------------------------------------------
>
>                 Key: TIKA-2641
>                 URL: https://issues.apache.org/jira/browse/TIKA-2641
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 2.0, 1.18
>            Reporter: Nick Burch
>            Priority: Minor
>
> We now have a number of parsers which deal with file formats which are either wholey
or optionally "table-based" formats with consistency in the data types held in a given column.
This includes multi-table formats like sqlite, single-table formats like sas7bdat, and anything-goes-table
formats like csv or xlsx
> We should firstly try to create a simple-ish, small but rich file for each of these formats,
similar to what we do for archive formats with the {{test-documents}} archives. Then, we should
add unit tests that verified that, as much as formats permit, you get basically the same XHTML
out for the "same" input. Oh, and fix up any obvious inconsistencies...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message