tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bob Paulin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1824) Tika 2.0 - Create Initial Parser Modules
Date Sat, 09 Jan 2016 00:54:39 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090285#comment-15090285

Bob Paulin commented on TIKA-1824:

* Perhaps rename artifact names in parser sub-components to include "Parser(s?)", e.g. Apache
Tika Parser Advanced Module so that the names sort more clearly (at least in the maven window
in Intellij)?

I think I felt it was redundant but in a maven repo it could be helpful so I can make that

* Perhaps add "parser(s?) to the artifactId, e.g. tika-parser-cad-module

Same as above.

* Perhaps lowercase names in parser-subcomponents so that they're inline with legacy: "Apache
Tika parser advanced module"

I think I'm missing where this convention is coming from.

* Pkcs7Parser ... should that be under advanced...or somewhere else ...own crypto package?

So I don't feel strongly that it needs to be under advanced but I do want to be careful not
to over do the number of modules.  Do you feel crypto has room for growth or is this just
going to forever be a one parser project?  

* iwork ...should we move that to office?

I think it could fit there too.  No issues moving.

* tika-test-resources...should we move TikaTest into that and change the name to tika-test?
I have a vague memory of wanting to carve out a separate test package earlier and adding TikaTest
and something else...

I think it could work in tika-core or tika-test.  I don't think I feel strongly either way.

* OutlookPSTParser...move that to office?

I'd like to keep this class with all the other mbox classes.  Maybe me mbox to office?

* Does MBox belong in web? Not sure where to put it?

Move to office?

* Move CommonsDigester to core if we're willing to add a dependency on commons-codec into

I'm fine with this.

* Move Activator to tika-bundle?

I believe tika-bundle already has an activator.  Could just remove this.

* Move pot to multimedia or add tika-parsers-multimedia-advanced-module?

Not sure I understand POT in multimedia.  Can you elaborate?

* Move geo.topic to "advanced"...perhaps we rename "advanced" to ner?

Is ner only applied to geo?  My understanding of this domain is limited

* Move ctakes to "advanced/ner"?

Again my understanding of the domain is limited on what ctakes fits with.

* Collapse web and text?

Not sure I like that since a number of modules depend on text but not web.  Seems like we'd
be adding a lot of needless dependencies.

> Tika 2.0 -  Create Initial Parser Modules
> -----------------------------------------
>                 Key: TIKA-1824
>                 URL: https://issues.apache.org/jira/browse/TIKA-1824
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 2.0
>            Reporter: Bob Paulin
>            Assignee: Bob Paulin
> Create initial break down of parser modules.

This message was sent by Atlassian JIRA

View raw message