nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-135) http header meta data are case insensitive in the real world (e.g. Content-Type or content-type)
Date Fri, 09 Dec 2005 21:59:08 GMT
    [ http://issues.apache.org/jira/browse/NUTCH-135?page=comments#action_12359961 ] 

Andrzej Bialecki  commented on NUTCH-135:
-----------------------------------------

Since you already are working on this issue, I'd like to ask you to take a look at NUTCH-3,
and see if you can solve this too. The problem described there is that if there are several
headers with the same name, only the last value is preserved, but in some cases multiple headers
make sense (see any of the existing Java models for handling HTTP or RFC822 mail messages
- all of them handle multiple values per single key).

> http header meta data are case insensitive in the real world (e.g. Content-Type or content-type)
> ------------------------------------------------------------------------------------------------
>
>          Key: NUTCH-135
>          URL: http://issues.apache.org/jira/browse/NUTCH-135
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Versions: 0.7.1, 0.7
>     Reporter: Stefan Groschupf
>     Priority: Critical
>      Fix For: 0.8-dev, 0.7.2-dev
>  Attachments: contentProperties_patch.txt
>
> As described in issue nutch-133, some webservers return http header meta data not standard
conform case insensitive.
> This provides many negative side effects, for example query thet content type from the
meta data return null also in case the webserver returns a content type, but the key is not
standard conform e.g. lower case. Also this has effects to the pdf parser that queries the
content length etc.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message