nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable
Date Fri, 01 Jun 2007 14:42:16 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500728
] 

Andrzej Bialecki  commented on NUTCH-392:
-----------------------------------------

> I think it is okay to allow BLOCK compression for linkdb, crawldb, crawl_*,
> content, parse_data. Because I don't think that people will need fast random-access
>  on anything but parse_text.

LinkDb is accessed on-line randomly through LinkDbInlinks, when users request anchors. Similarly,
parse_data is accessed when requesting "explain", and may be also accessed to retrieve other
hit metadata. Content is accessed randomly when displaying cached preview. I think in all
these cases we can use at most RECORD compression, or NONE.

> OutputFormat implementations should pass on Progressable
> --------------------------------------------------------
>
>                 Key: NUTCH-392
>                 URL: https://issues.apache.org/jira/browse/NUTCH-392
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>            Reporter: Doug Cutting
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to underlying
SequenceFile implementations.  This will keep reduce tasks from timing out when block writes
are slow.  This issue depends on http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message