nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-893) DataStore.put() silently loses records when executed from multiple processes
Date Wed, 08 Sep 2010 16:36:33 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907297#action_12907297
] 

Andrzej Bialecki  commented on NUTCH-893:
-----------------------------------------

Very good catch - yes, the test now passes for me too. This is actually good news for Gora
:) I'll continue digging regarding NUTCH-879 ... don't hesitate if you have any ideas how
to solve that. I suspect we may be losing keys in Generator or Fetcher, due to partitioning
collisions but this hypothesis needs to be tested.

> DataStore.put() silently loses records when executed from multiple processes
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-893
>                 URL: https://issues.apache.org/jira/browse/NUTCH-893
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 2.0
>         Environment: Gora HEAD, SqlStore, MySQL 5.1, Ubuntu 10.4 x64, Sun JDK 1.6
>            Reporter: Andrzej Bialecki 
>            Priority: Blocker
>             Fix For: 2.0
>
>         Attachments: NUTCH-893.patch, NUTCH-893_v2.patch
>
>
> In order to debug the issue described in NUTCH-879 I created a test to simulate multiple
clients appending to webtable (please see the patch), which is the situation that we have
in distributed map-reduce jobs.
> There are two tests there: one that uses multiple threads within the same JVM, and another
that uses single thread in multiple JVMs. Each test first clears webtable (be careful!), and
then puts a bunch of pages, and finally counts that all are present and their values correspond
to keys. To make things more interesting each execution context (thread or process) closes
and reopens its instance of DataStore a few times.
> The multithreaded test passes just fine. However, the multi-process test fails with missing
keys, as many as 30%.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message