nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Tang <him...@gmail.com>
Subject Re: [jira] Commented: (NUTCH-135) http header meta data are case insensitive in the real world (e.g. Content-Type or content-type)
Date Sat, 10 Dec 2005 14:30:27 GMT
Stefan

It seemed your patch missing
org.apache.nutch.protocol.ContentProperties class, right?

/Jack

On 12/10/05, Stefan Groschupf (JIRA) <jira@apache.org> wrote:
>     [ http://issues.apache.org/jira/browse/NUTCH-135?page=comments#action_12360025 ]
>
> Stefan Groschupf commented on NUTCH-135:
> ----------------------------------------
>
> Andrzej, that is easy to add to the ContentProperties object and sure I can do that.
However first I would love to get a OK for this patch, before I invest more time in it, since
I spend to many time writing stuff just for the issue archive.
> As soon this patch is in the sources I will write a small new patch (as Doug suggested,
do it in small steps) to solve NUTCH-3
>
> > http header meta data are case insensitive in the real world (e.g. Content-Type
or content-type)
> > ------------------------------------------------------------------------------------------------
> >
> >          Key: NUTCH-135
> >          URL: http://issues.apache.org/jira/browse/NUTCH-135
> >      Project: Nutch
> >         Type: Bug
> >   Components: fetcher
> >     Versions: 0.7, 0.7.1
> >     Reporter: Stefan Groschupf
> >     Priority: Critical
> >      Fix For: 0.8-dev, 0.7.2-dev
> >  Attachments: contentProperties_patch.txt
> >
> > As described in issue nutch-133, some webservers return http header meta data not
standard conform case insensitive.
> > This provides many negative side effects, for example query thet content type from
the meta data return null also in case the webserver returns a content type, but the key is
not standard conform e.g. lower case. Also this has effects to the pdf parser that queries
the content length etc.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Mime
View raw message