nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerome Charron (JIRA)" <>
Subject [jira] Resolved: (NUTCH-135) http header meta data are case insensitive in the real world (e.g. Content-Type or content-type)
Date Sun, 11 Dec 2005 00:45:08 GMT
     [ ]
Jerome Charron resolved NUTCH-135:

    Fix Version:     (was: 0.7.2-dev)
     Resolution: Fixed

Committed to trunk (to be merged into branche 0.7?)
Thanks Stefan.

I have performed unit and functional tests, but I don't have resources for a wide and intensive
If someone can perform such test, it would be greatly apreciated.

Note: During my tests, I notice some strange content-types returned by and all related files. The content-type returned by the protocol layer to the Content constructor
is always text/plain, but when performing some wget on these sites the content-type in headers
is text/html ... sorry, I don't have time for more investigations..

> http header meta data are case insensitive in the real world (e.g. Content-Type or content-type)
> ------------------------------------------------------------------------------------------------
>          Key: NUTCH-135
>          URL:
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Versions: 0.7, 0.7.1
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev
>  Attachments: cached.jsp.patch, contentProperties_patch.txt, contentProperties_patch_WithContentProperties.txt
> As described in issue nutch-133, some webservers return http header meta data not standard
conform case insensitive.
> This provides many negative side effects, for example query thet content type from the
meta data return null also in case the webserver returns a content type, but the key is not
standard conform e.g. lower case. Also this has effects to the pdf parser that queries the
content length etc.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message