tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TIKA-1513) Add mime detection and parsing for dbf files
Date Wed, 13 Apr 2016 10:53:25 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239062#comment-15239062
] 

Tim Allison edited comment on TIKA-1513 at 4/13/16 10:52 AM:
-------------------------------------------------------------

[~gagravarr], would you mind taking a look at the detector?  Is there a way that we can convert
this to a mime definition?  Or should we add a DBFDetector?

[~nicholasc], it looks great to me.  I agree that we'll probably want to relax some of the
length checks (just make sure they're > 0 or something reasonable)...we wouldn't want this
to fail on truncated dbfs, and as you've pointed out, there can be extra bytes at the end
of the file.  If there's any way to avoid adding the dependency, that'd be great...although,
I very much appreciate the concern for overflow!

In your experience, do we need to validate the fieldentry or can we stop sooner?  If we do,
then I suspect there's no way to convert to a mime definition, but I suspect much of the earlier
stuff could easily be translated.

Oh, and please make sure to add an Apache license header...unless Nick B can easily translate
this to a mime definition. :)

Thank you!


was (Author: tallison@mitre.org):
[~gagravarr], would you mind taking a look at the detector?  Is there a way that we can convert
this to a mime definition?  Or should we add a DBFDetector?

[~nicholasc], it looks great to me.  I agree that we'll probably want to relax some of the
length checks (just make sure they're > 0 or something reasonable)...we wouldn't want this
to fail on truncated dbfs, and as you've pointed out, there can be extra bytes at the end
of the file.  If there's any way to avoid adding the dependency, that'd be great...although,
I very much appreciate the concern for overflow!

In your experience, do we need to validate the fieldentry or can we stop sooner?  If we do,
then I suspect there's no way to convert to a mime definition, but I suspect much of the earlier
stuff could easily be translated.

Thank you!

> Add mime detection and parsing for dbf files
> --------------------------------------------
>
>                 Key: TIKA-1513
>                 URL: https://issues.apache.org/jira/browse/TIKA-1513
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>             Fix For: 1.13
>
>
> I just came across an Apache licensed dbf parser that is available on [maven|https://repo1.maven.org/maven2/org/jamel/dbf/dbf-reader/0.1.0/dbf-reader-0.1.0.pom].
> Let's add dbf parsing to Tika.
> Any other recommendations for alternate parsers?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message