nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <>
Subject [jira] Commented: (NUTCH-562) Port mime type framework to use Tika mime detection framework
Date Tue, 09 Oct 2007 04:30:51 GMT


Hudson commented on NUTCH-562:

Integrated in Nutch-Nightly #231 (See [])

> Port mime type framework to use Tika mime detection framework
> -------------------------------------------------------------
>                 Key: NUTCH-562
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: mime_type_detector
>    Affects Versions: 1.0.0
>         Environment: Mac Book Pro, Intel Core Duo 2.0 Ghz, 2.0 GB RAM, Mac OS X 10.4
although improvement is indep of env
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>         Attachments: NUTCH-562.Mattmann.patch.txt, tika-0.1-dev.jar
> With Tika ( nearing  a stable 0.1 release candidate,
I think it would be a good time to patch Nutch to use Tika's mime detection system (an improvement
over the existing Nutch one written primarily by Jerome). Tika's mime system is based on the
mime system from and includes several improvements over the existing Nutch
mime system such as:
> 1. reliable XML-based content detection (a clear issue plaguing Nutch for some time now),
ability to delineate between RSS, XML, ATOM, etc.
> 2. mime magic pattern matching, including support for multiple patterns
> 3. glob pattern matches (ability to support > 1)
> I'll get together a patch and then attach it to the list once it's relatively stable.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message