nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <>
Subject [jira] [Updated] (NUTCH-1357) All gora mapreduce functionality should go through StorageUtils
Date Tue, 18 Sep 2012 20:13:07 GMT


Lewis John McGibbney updated NUTCH-1357:

    Fix Version/s:     (was: 2.1)
> All gora mapreduce functionality should go through StorageUtils
> ---------------------------------------------------------------
>                 Key: NUTCH-1357
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: nutchgora
>            Reporter: Ferdy Galema
>             Fix For: 2.2
> I am trying to make the concept of crawlId work for ALL nutch jobs: it seems the biggest
problem with it not working as expected is because of the various ways gora mapreduce is used
in nutch.
> Some jobs use StorageUtils, some use GoraMapper/GoraReduce, some even use directly GoraInputFormat/GoraOutputFormat.
But the only place the translation is made from crawlId into a schema name is in StorageUtils!
Currently I am converting all calls to Gora* mapreduce initializing code to StorageUtils calls.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message