tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Paulin <...@bobpaulin.com>
Subject Re: Tika 2.0 Modules first pass.
Date Wed, 06 Jan 2016 05:00:04 GMT
Thanks Chris!

- Bob

On 1/5/2016 10:32 PM, Mattmann, Chris A (3980) wrote:
> Thanks Bob took care of 6 for ya:
>
> https://wiki.apache.org/tika/ContributorsGroup
>
> I should be able to review this, but not going to be complete review
> for a few weeks.. thanks for your great work
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> -----Original Message-----
> From: Bob Paulin <bob@apache.org>
> Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
> Date: Tuesday, January 5, 2016 at 7:54 PM
> To: "dev@tika.apache.org" <dev@tika.apache.org>
> Subject: Tika 2.0 Modules first pass.
>
>> All,
>>
>> I took a stab at the initial module structure based on Tim and my email
>> [1].  If a package didn't seem to fit with anything else I created an
>> individual project for it.  If any of the groupings don't make sense or
>> folks think there are better ways to organize I'm happy to move stuff
>> around.  Patches are welcome :).  I have a JIRA created [2].  Commited
>> with rev 1723223.
>>
>> There's still a good amount of outstanding work:
>> 1) All this could use more testing.  Especially with the external parsers.
>> 2) As Tim has already raised there is the issue of dual maintaining
>> branches.  There are likely some fixes in trunk that are not currently
>> applied to the 2.0 branch.
>> 3) The tika-parser project is currently using the maven shade plugin and
>> that is causing issues creating the OSGi Manifest.MF file.  I should be
>> able to find a way around this.
>> 4) Still need to recreate the OSGi uber jar with all dependencies
>> packaged with the tika code.
>> 5) There are still some classes in the tika-parser project.  Should
>> these all be moved to core? A common project?...
>> 6) Documentation.  I could use some Wiki access.  Username: BobPaulin.
>> 7) There are some dependencies in the tika-parser project that were not
>> needed to compile any of the individual modules or run tests. Are they
>> still needed?
>> 8) Where does the
>> org.apache.tika.parser.external.CompositeExternalParser ServiceLoader
>> (META-INF/services/org.apache.tika.parser.Parser) config belong.  I
>> moved it to tika-core since that is where the class lives.
>> 9) Subcomponent licenses.  I moved them to the modules they belong in
>> but I need to figure out a way to make them bubble up to the uber jars.
>> Or perhaps they need to be dual maintained.
>> 10) Anything I may be forgetting....;)
>>
>> For the most part all the changes just to organize the existing
>> packages.  There are a handful of changes to the test suite in order to
>> break some cyclical dependencies.  Here's an overview of how the
>> projects interrelate at the moment:
>>
>> tika-parser-modules
>>   - /tika-advanced-module
>>   - /tika-cad-module
>>             -> tika-text-module [test]
>>   - /tika-code-module
>>             -> tika-text-module [test]
>>   - /tika-database-module
>>             -> tika-office-module [test]
>>   - /tika-ebook-module
>>             -> tika-text-module
>>   - /tika-journal-module
>>             -> tika-pdf-module
>>   - /tika-multimedia-module
>>             -> tika-web-module [test]
>>             -> tika-office-module [test]
>>             -> tika-pdf-module [test]
>>   - /tika-office-module
>>             -> tika-web-module [test]
>>             -> tika-package-module [test]
>>             -> tika-text-module [test]
>>   - /tika-package-module
>>   - /tika-pdf-module
>>            -> tika-text-module [test]
>>            -> tika-package-module [test]
>>            -> tika-office-module [test]
>>   - /tika-scientific-module
>>            -> tika-text-module [test]
>>   - /tika-text-module
>>   -/tika-web-module
>>            -> tika-text-module [test]
>>            -> tika-package-module [test]
>>
>> Very interested in feedback since we have been talking about this for a
>> bit but I'm sure actually seeing it will create more discussion. Looking
>> at how much simpler the individual pom files does seem to demonstrate
>> that this will be a good thing for the project.
>>
>> Cheers,
>>
>> - Bob
>>
>> [1]
>> http://mail-archives.apache.org/mod_mbox/tika-dev/201508.mbox/%3C55CF4C19.
>> 6050503%40bobpaulin.com%3E
>> [2] https://issues.apache.org/jira/browse/TIKA-1824


Mime
View raw message