nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prernasatija <...@git.apache.org>
Subject [GitHub] nutch pull request: 2.x
Date Tue, 15 Sep 2015 04:42:08 GMT
GitHub user prernasatija opened a pull request:

    https://github.com/apache/nutch/pull/57

    2.x

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/nutch 2.x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/57.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #57
    
----
commit f7ef04dca1b763e86502a3b23064520ded39181e
Author: Ferdy Galema <ferdy@apache.org>
Date:   2012-08-31T12:49:26Z

    NUTCH-1462 Elasticsearch not indexing when type==null in NutchDocument metadata
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1379431 13f79535-47bb-0310-9956-ffa450edef68

commit 1bb03c759180688f58284189abca787437935647
Author: Ferdy Galema <ferdy@apache.org>
Date:   2012-08-31T12:56:41Z

    NUTCH-1463 Elasticsearch indexer should wait and check response for last flush
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1379435 13f79535-47bb-0310-9956-ffa450edef68

commit c5e2236f36a881ee7fec97aff3baf9bb32b40200
Author: Ferdy Galema <ferdy@apache.org>
Date:   2012-08-31T13:02:32Z

    NUTCH-1448 Redirected urls should be handled more cleanly (more like an outlink url)
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1379438 13f79535-47bb-0310-9956-ffa450edef68

commit 33de245d3211d2be19559870c5a821381e18e9c0
Author: Ferdy Galema <ferdy@apache.org>
Date:   2012-08-31T15:57:18Z

    NUTCH-1431 Introduce link 'distance' and add configurable max distance in the generator
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1379488 13f79535-47bb-0310-9956-ffa450edef68

commit c1b68c35ee02d1588786d5767f3feaa71b5393e1
Author: Ferdy Galema <ferdy@apache.org>
Date:   2012-09-07T08:17:58Z

    NUTCH-1459 Remove dead code (phase2) from InjectorJob
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1381931 13f79535-47bb-0310-9956-ffa450edef68

commit e878515c26e1bceaed2555a3cac2402322f27046
Author: Ferdy Galema <ferdy@apache.org>
Date:   2012-09-07T14:19:47Z

    NUTCH-1456 Updater not setting batchId in markers correctly. (Alexander Kingson via ferdy)
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1382037 13f79535-47bb-0310-9956-ffa450edef68

commit 32b825c58bcb1647bec548cb1ea17ee4ae522399
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-09-15T16:16:48Z

    NUTCH-1162 Write JUnit tests for parse-js
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1385103 13f79535-47bb-0310-9956-ffa450edef68

commit 4369dac176a228d0c9ef729dca89bcff0e097211
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-09-15T23:06:34Z

    NUTCH-1470 Ensure test files are included for runtime testing
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1385199 13f79535-47bb-0310-9956-ffa450edef68

commit ecb86f4de0209c73e5b00fa0df8d4c6f58c592bf
Author: Ferdy Galema <ferdy@apache.org>
Date:   2012-09-17T09:24:33Z

    NUTCH-1468 Redirects that are external links not adhering to db.ignore.external.links
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1386526 13f79535-47bb-0310-9956-ffa450edef68

commit 068636631cc73786b150e1ec2cd0be38919890e7
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-09-18T14:07:57Z

    NUTCH-1162 test file
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1387173 13f79535-47bb-0310-9956-ffa450edef68

commit 19e694e609776a388ce1409a3272a2a15b101222
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-09-18T14:13:26Z

    add keyspace reference to NullPointerException on inject before
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1387175 13f79535-47bb-0310-9956-ffa450edef68

commit 590ad02aea95c1dcb9c6ad25de1e38a815c7fa82
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-09-18T20:30:25Z

    NUTCH-1432 property storage.schema does not work anymore, should be storage.schema.webpage
and storage.schema.host
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1387347 13f79535-47bb-0310-9956-ffa450edef68

commit fceecfabb9c47952f0ec2b3fcd2a6241dbedb465
Author: Sebastian Nagel <snagel@apache.org>
Date:   2012-09-18T20:52:08Z

    NUTCH-1415 release packages to contain top level folder apache-nutch-x.x
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1387356 13f79535-47bb-0310-9956-ffa450edef68

commit 2da30f3d398a53da6fcc85f143e8b2d0b1c75837
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-09-21T14:37:07Z

    revert gora-cassandra to v0.2, prepare for 2.2 development
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1388529 13f79535-47bb-0310-9956-ffa450edef68

commit bc7ef2e9c62606c5f134d5e1ad8ea001d90dbd36
Author: Sebastian Nagel <snagel@apache.org>
Date:   2012-10-10T21:05:19Z

    NUTCH-706 Url regex normalizer: pattern for session id removal not to match "newsId"
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1396795 13f79535-47bb-0310-9956-ffa450edef68

commit 2e31b117aa7e25193bcdeabce4088f71c91a7029
Author: Sebastian Nagel <snagel@apache.org>
Date:   2012-10-10T21:15:55Z

    NUTCH-1344 BasicURLNormalizer to normalize https same as http
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1396800 13f79535-47bb-0310-9956-ffa450edef68

commit 8b35d734a5112af93f571aab218e190a225990dd
Author: Sebastian Nagel <snagel@apache.org>
Date:   2012-10-10T21:58:06Z

    NUTCH-706 (applied correct patch)
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1396822 13f79535-47bb-0310-9956-ffa450edef68

commit f9d0e7685d7f43cc8f1bbbd37d73fe2d9ddc4461
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-10-10T23:02:57Z

    NUTCH-874 Make sure all plugins in src/plugin are compatible with Nutch 2.0 and Gora (part
1)
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1396850 13f79535-47bb-0310-9956-ffa450edef68

commit 33e7ae5a7ed524939e91f887de7c9821deb8a866
Author: Julien Nioche <jnioche@apache.org>
Date:   2012-10-20T08:49:53Z

    NUTCH-1087 crawl script
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1400390 13f79535-47bb-0310-9956-ffa450edef68

commit 39893c6e5681e6936572f5d9983ab1decd085bf5
Author: Julien Nioche <jnioche@apache.org>
Date:   2012-10-20T09:14:40Z

    NUTCH-1433 Upgrade to Tika 1.2
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1400397 13f79535-47bb-0310-9956-ffa450edef68

commit 244ebf6682c3ea5969a2f36ab72e0fa2fceead31
Author: Sebastian Nagel <snagel@apache.org>
Date:   2012-10-23T20:47:16Z

    NUTCH-1344 BasicURLNormalizer to normalize https same as http - forgot to add committer
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1401458 13f79535-47bb-0310-9956-ffa450edef68

commit 0cffa912513dcdd6526ae4189f7207f23c903b49
Author: Sebastian Nagel <snagel@apache.org>
Date:   2012-10-23T20:52:21Z

    NUTCH-1421 RegexURLNormalizer to only skip rules with invalid patterns
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1401460 13f79535-47bb-0310-9956-ffa450edef68

commit a722e43d2c5a6225d46b2178174def4918a6b4d4
Author: Markus Jelsma <markus@apache.org>
Date:   2012-11-06T09:17:38Z

    NUTCH-1491 Strip UTF-8 non-character codepoints in title
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1406077 13f79535-47bb-0310-9956-ffa450edef68

commit c7342c74b52a0fc2ee6c070299f997f673584013
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-11-07T18:47:54Z

    NUTCH-1493 Error adding field 'contentLength'='' during solrindex using index-more
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1406749 13f79535-47bb-0310-9956-ffa450edef68

commit e9b46e9088e48c45a4086b983117ebaf3e202e30
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-11-09T16:35:50Z

    * NUTCH-1488 bin/nutch to run junit from any directory (snagel via lewismc)
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1407531 13f79535-47bb-0310-9956-ffa450edef68

commit f35d6ab520701be0fd345be5b577eba73ecee9e4
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-11-12T12:53:27Z

    NUTCH-1496 ParserJob logs skipped urls with level info
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1408271 13f79535-47bb-0310-9956-ffa450edef68

commit 37c31a62c488ef0d9b248f1be8e930db29ba38ed
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-11-12T13:56:30Z

    NUTCH-1451 Upgrade automaton jar to 1.11-8
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1408289 13f79535-47bb-0310-9956-ffa450edef68

commit 0d350bc0f6e9468b7560de443230425550099550
Author: Sebastian Nagel <snagel@apache.org>
Date:   2012-11-12T21:20:55Z

    NUTCH-1484 TableUtil unreverseURL fails on file:// URLs
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1408465 13f79535-47bb-0310-9956-ffa450edef68

commit 1873f6eb3e8c2c5d6b5a55dff1304397c66dcbe9
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-11-22T14:45:07Z

    NUTCH-1370 Expose exact number of urls injected @runtime
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1412566 13f79535-47bb-0310-9956-ffa450edef68

commit 3a1effa22216236e8989aed39a4b7bc3cb0b1f9c
Author: Lewis John McGibbney <lewismc@apache.org>
Date:   2012-11-22T14:51:28Z

    NUTCH-1370 Expose exact number of urls injected @runtime
    
    git-svn-id: https://svn.apache.org/repos/asf/nutch/branches/2.x@1412570 13f79535-47bb-0310-9956-ffa450edef68

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message