tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yaniv Kunda <yaniv.ku...@answers.com>
Subject RE: [jira] [Updated] (TIKA-1706) Bring back commons-io to tika-core
Date Thu, 26 Nov 2015 11:49:01 GMT
It’s been almost two months since I provided my patches for this –

Can a committer please review and submit?

*From:* Yaniv Kunda [mailto:yaniv.kunda@answers.com]
*Sent:* Monday, October 12, 2015 23:08
*To:* dev@tika.apache.org
*Subject:* Re: [jira] [Updated] (TIKA-1706) Bring back commons-io to

Is this solution applicable?
I have some improvements waiting for this.

On Oct 1, 2015 5:57 PM, "Yaniv Kunda (JIRA)" <jira@apache.org> wrote:


Yaniv Kunda updated TIKA-1706:
    Attachment: TIKA-1706-2.patch

A proposed patch per [~grossws]'s suggestion from the dev mailing list -
The first patch contains the following:
- creation of the secondary jar using maven-shade-plugin:
-- used the *uber* classifier using <shadedClassifierName>
alternatives: shaded, nodep, all, etc.
Which one is best?
-- commons-io shaded under {{shaded.commons-io.$\{commons.io.version\}.
org.apache.commons.io}} to avoid potential conflicts with other
commons-io-shading dependencies e.g. as in
-- automatic removal of unused classes using <minimizeJar>
- deprecated all classes that were copied from commons-io and modified them
to extend their new counterparts
- deprecated all constructors
- removed all identical or functionally identical methods
- modified all remaining methods to call alternative existing
jdk/commons-io methods, deprecated them and refered to the used alternatives
_*Note: this was done only in IOUtils, where many methods that has the same
signature as the ones in commons-io were modified along the way to use
UTF-8 instead of the platform default._
- all things should remain backward-compatible, except one:
org.apache.tika.io.TaggedIOException(IOException, Object) will now throw a
ClassCastException if the Object is not Serializable

The second patch contains trivial import changes in tika-core from
org.apache.tika.io to org.apache.commons.io

> Bring back commons-io to tika-core
> ----------------------------------
>                 Key: TIKA-1706
>                 URL: https://issues.apache.org/jira/browse/TIKA-1706
>             Project: Tika
>          Issue Type: Improvement
>          Components: core
>            Reporter: Yaniv Kunda
>            Priority: Minor
>             Fix For: 1.11
>         Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch
> TIKA-249 inlined select commons-io classes in order to simplify the
dependency tree and save some space.
> I believe these arguments are weaker nowadays due to the following
> - Most of the non-core modules already use commons-io, and since
tika-core is usually not used by itself, commons-io is already included
with it
> - Since some modules use both tika-core and commons-io, it's not clear
which code should be used
> - Having the inlined classes causes more maintenance and/or technology
debt (which in turn causes more maintenance)
> - Newer commons-io code utilizes newer platform code, e.g. using Charset
objects instead of encoding names, being able to use StringBuilder instead
of StringBuffer, and so on.
> I'll be happy to provide a patch to replace usages of the inlined classes
with commons-io classes if this is accepted.

This message was sent by Atlassian JIRA


This email communication (including any attachments) contains information 
from Answers Corporation or its affiliates that is confidential and may be 
privileged. The information contained herein is intended only for the use 
of the addressee(s) named above. If you are not the intended recipient (or 
the agent responsible to deliver it to the intended recipient), you are 
hereby notified that any dissemination, distribution, use, or copying of 
this communication is strictly prohibited. If you have received this email 
in error, please immediately reply to sender, delete the message and 
destroy all copies of it. If you have questions, please email 

If you wish to unsubscribe to commercial emails from Answers and its 
affiliates, please go to the Answers Subscription Center 
http://campaigns.answers.com/subscriptions to opt out.  Thank you.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message