nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (Jira)" <j...@apache.org>
Subject [jira] [Updated] (NUTCH-2760) protocol-okhttp: properly record HTTP version in request message header
Date Fri, 13 Dec 2019 11:58:00 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sebastian Nagel updated NUTCH-2760:
-----------------------------------
    Labels: patch-available  (was: )

> protocol-okhttp: properly record HTTP version in request message header
> -----------------------------------------------------------------------
>
>                 Key: NUTCH-2760
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2760
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin, protocol
>    Affects Versions: 1.16
>            Reporter: Sebastian Nagel
>            Priority: Minor
>              Labels: patch-available
>             Fix For: 1.17
>
>
> The HTTP version in the request message tracked by the plugin protocol-okhttp ({{store.http.request=true}})
is not the version sent in the request but that received from the response.
> Note that the HTTP version sent in the request may differ from that sent back in the
response. One example (tracked using wget):
> {noformat}
> > wget -d https://www.kp.ru/daily/27061/4129507/
> ...
> ---request begin---
> GET /daily/27061/4129507/ HTTP/1.1
> User-Agent: Wget/1.20.3 (linux-gnu)
> Accept: */*
> Accept-Encoding: identity
> Host: www.kp.ru
> Connection: Keep-Alive
> ---request end---
> HTTP request sent, awaiting response... 
> ---response begin---
> HTTP/1.0 200 OK
> ...
> {noformat}
> protocol-http uses the response version ("HTTP/1.0") also for the request:
> {noformat}
> > bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true \
>      -Dplugin.includes='protocol-okhttp|parse-html' https://www.kp.ru/daily/27061/4129507/
> ...
> _request_=GET /daily/27061/4129507/ HTTP/1.0
> ...
> _response.headers_=HTTP/1.0 200 OK
> ...
> {noformat}
> The protocol-http tracks the versions correctly:
> {noformat}
> > bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true \
>      -Dplugin.includes='protocol-http|parse-html' https://www.kp.ru/daily/27061/4129507/
> ...
> _request_=GET /daily/27061/4129507/ HTTP/1.1
> ...
> _response.headers_=HTTP/1.0 200 OK
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message