nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-781) Update Tika to v0.6 for the MimeType detection
Date Tue, 02 Feb 2010 09:30:19 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828548#action_12828548
] 

Julien Nioche commented on NUTCH-781:
-------------------------------------

> did you forgot to update conf/tika-mimetypes.xml ?
indeed - well spotted, thanks

> Related question: do we actually need our own version on the tika config anymore? I saw
there were some old issues that were fixed in the custom version but i would quess those changes,
if important, have already made their way into Tika?
the version we had was the same as the one provided by Tika 0.4 so I suppose we could safely
rely on theTika defaults. MimeUtil currently requires needs tika-mimetypes.xml to be in the
available in the classpath but we could modify that so that it uses the default version from
the tika jar if nothing can be found in conf. Let's put that in a separate JIRA issue if we
really want it, in the meantime I'll commit the v 0.6 of tika-mimetypes.xml

J.


> Update Tika to v0.6  for the MimeType detection
> -----------------------------------------------
>
>                 Key: NUTCH-781
>                 URL: https://issues.apache.org/jira/browse/NUTCH-781
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.1
>
>
> [from annoucement]
> Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
> extracting metadata and structured text content from various documents using
> existing parser libraries.
> Apache Tika 0.6 contains a number of improvements and bug fixes. Details can
> be found in the changes file:
> http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message