nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney (JIRA) <j...@apache.org>
Subject [jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable
Date Fri, 01 Jun 2007 10:54:15 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500665
] 

Doğacan Güney commented on NUTCH-392:
-------------------------------------

I think it is okay to allow BLOCK compression for linkdb, crawldb, crawl_*, content, parse_data.
Because I don't think that people will need fast random-access on anything but parse_text.


I agree that we need to test performance impact of BLOCK compression before committing such
a change. Unfortunately, our  setup doesn't include BLOCK compression right now. I will try
to test it and report some results once I get the chance.

PS: Compressing content will not have significant savings right now since it is already compressed
internally but once content stops doing that I think there will be _huge_ savings there. 

> OutputFormat implementations should pass on Progressable
> --------------------------------------------------------
>
>                 Key: NUTCH-392
>                 URL: https://issues.apache.org/jira/browse/NUTCH-392
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>            Reporter: Doug Cutting
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to underlying
SequenceFile implementations.  This will keep reduce tasks from timing out when block writes
are slow.  This issue depends on http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message