nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney (JIRA) <j...@apache.org>
Subject [jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable
Date Fri, 01 Jun 2007 07:53:15 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500603
] 

Doğacan Güney commented on NUTCH-392:
-------------------------------------

>From what I  understand of MapFile.Writer code in hadoop, if you give CompressionType
as an argument in its constructor it overwrites the compression value in config. So since
nutch manually sets parse_text and parse_data to RECORD compression ( and crawl_parse to NONE),
we will not get the advantages of BLOCK compression even if we set it in config. 

BLOCK compression seems to work really great if you got the native libraries in place, so
IMHO it would be better to not manually set CompressionType and allow people to set it to
whatever they want in config.

> OutputFormat implementations should pass on Progressable
> --------------------------------------------------------
>
>                 Key: NUTCH-392
>                 URL: https://issues.apache.org/jira/browse/NUTCH-392
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>            Reporter: Doug Cutting
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to underlying
SequenceFile implementations.  This will keep reduce tasks from timing out when block writes
are slow.  This issue depends on http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message