nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop > 2.4 stable
Date Fri, 21 Aug 2015 14:26:46 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706789#comment-14706789
] 

Sebastian Nagel commented on NUTCH-2049:
----------------------------------------

+1 to commit, as said, looking on performance of the unit tests can be done later, some details
below.

{noformat}
% time ant clean runtime test
(before)
Total time: 5 minutes 34 seconds
real    5m35.133s
user    7m30.968s
sys     0m21.528s

(after patching)
Total time: 6 minutes 39 seconds
real    6m39.794s
user    9m31.444s
sys     0m26.780s
{noformat}

These tests show significant differences, `-' before, `+' after patching:
{noformat}
     [junit] Running org.apache.nutch.crawl.TestGenerator
-    [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.846 sec
+    [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.279 sec
...
     [junit] Running org.apache.nutch.fetcher.TestFetcher
-    [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.068 sec
+    [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.273 sec
...
     [junit] Running org.apache.nutch.parse.TestParserFactory
-    [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.783 sec
+    [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.038 sec
...
     [junit] Running org.apache.nutch.segment.TestSegmentMerger
-    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 75.408 sec
+    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 91.652 sec
     [junit] Running org.apache.nutch.segment.TestSegmentMergerCrawlDatums
-    [junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 69.821 sec
+    [junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.443 sec
{noformat}

> Upgrade Trunk to Hadoop > 2.4 stable
> ------------------------------------
>
>                 Key: NUTCH-2049
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2049
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>              Labels: memex
>             Fix For: 1.11
>
>         Attachments: NUTCH-2049.patch, NUTCH-2049v2.patch, NUTCH-2049v3.patch
>
>
> Convo here - http://www.mail-archive.com/dev%40nutch.apache.org/msg18225.html
> I am +1 for taking trunk (or a branch of trunk) to explicit dependency on > Hadoop
2.6.
> We can run our tests, we can validate, we can fix.
> I will be doing validation on 2.X in paralegal as this is what I use on my own projects.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message