nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Talat UYARER (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1086) Rewrite protocol-httpclient
Date Tue, 17 Sep 2013 08:52:53 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769338#comment-13769338
] 

Talat UYARER commented on NUTCH-1086:
-------------------------------------

Hi Markus,

Yes I know that Httpclient is still in development as part of Apache HttpComponents. Second
comment is very good information for me. Actually i asked that question because i found a
little bug in protocol-http: Even If I have http.content.limit value set, protocol-http fetches
files of all sizes (larger files are fetched until limit allows). 
But when Parsing, parser skips incomplete files (parser.skip.truncated configuration). It
seems like an unnecessary effort to partially fetch contents larger than limit if they are
not gonna be parsed.
What do you think about this? I will upload a patch about this issue.
                
> Rewrite protocol-httpclient
> ---------------------------
>
>                 Key: NUTCH-1086
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1086
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: nutchgora, 1.5
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 2.4
>
>
> There are several issues about protocol-httpclient and several comments about rewriting
the plugin with the new http client libraries. There is, however, not yet an issue for rewriting/reimplementing
protocol-httpclient.
> http://hc.apache.org/httpcomponents-client-ga/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message