tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2462) Add a parser for sas7bdat
Date Tue, 02 Jan 2018 17:52:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308429#comment-16308429
] 

Nick Burch commented on TIKA-2462:
----------------------------------

While we wait for the re-license to go through, I've had a look at writing a parser. Outputting
as CSV is very easy, as they've got a great class to do all the work. SAX events of a HTML
table will be trickier, as the logic to format a raw value in a given column to "a string
of how it looks in SAS" is currently in a private method. I've raised [#24|https://github.com/epam/parso/issues/24]
to see if that can be refactored out, to avoid us needing to duplicate lots of their code

Tika questions on column metadata, test files etc still remain for us though!

> Add a parser for sas7bdat
> -------------------------
>
>                 Key: TIKA-2462
>                 URL: https://issues.apache.org/jira/browse/TIKA-2462
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>
> EPAM recently agreed to migrate to Apache 2.0 so that we can incorporate parso into Tika
for sas7bdat files: https://github.com/epam/parso/issues/19 !!!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message