nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-608) Upgrade nutch to use released apache-tika-0.1-incubating
Date Mon, 11 Feb 2008 17:06:08 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567706#action_12567706
] 

Andrzej Bialecki  commented on NUTCH-608:
-----------------------------------------

One additional comment, now that the important changes are visible ;) Since you add a util-type
class anyway (MimeUtils), why not encapsulate all interactions with Tika inside this class?
This way we can protect the rest of Nutch code from future changes in Tika API, and we can
avoid adding Tika imports to various classes ...

The class could be patterned after many other similar classes, having a constructor like MimeUtils(Configuration
conf). Then it can wrap all the initialization code, string splitting and the fallback strategies
without exposing any Tika classes.

> Upgrade nutch to use released apache-tika-0.1-incubating
> --------------------------------------------------------
>
>                 Key: NUTCH-608
>                 URL: https://issues.apache.org/jira/browse/NUTCH-608
>             Project: Nutch
>          Issue Type: Improvement
>          Components: mime_type_detector
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-608.Mattmann.021008.patch.txt, NUTCH-608.Mattmann.021108.patch.txt,
tika-0.1-incubating.jar
>
>
> This patch will upgrade Nutch to use the released tika-0.1-incubating jar containing
stable APIs and code, as opposed to the -dev version of the jar file that's currently in place
in SVN.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message