Apache Jenkins Server |
Jenkins build is back to normal : Nutch-nutchgora #213 |
Sun, 01 Apr, 04:21 |
Apache Jenkins Server |
Jenkins build is back to normal : Nutch-trunk #1804 |
Sun, 01 Apr, 04:32 |
Julien Nioche (Resolved) (JIRA) |
[jira] [Resolved] (NUTCH-1234) Upgrade to Tika 1.1 |
Mon, 02 Apr, 11:51 |
Julien Nioche (Commented) (JIRA) |
[jira] [Commented] (NUTCH-1234) Upgrade to Tika 1.1 |
Mon, 02 Apr, 11:51 |
Markus Jelsma (Commented) (JIRA) |
[jira] [Commented] (NUTCH-1234) Upgrade to Tika 1.1 |
Mon, 02 Apr, 12:13 |
Hudson (Commented) (JIRA) |
[jira] [Commented] (NUTCH-1234) Upgrade to Tika 1.1 |
Mon, 02 Apr, 12:51 |
Apache Jenkins Server |
Build failed in Jenkins: nutch-trunk-maven #222 |
Mon, 02 Apr, 14:06 |
Hudson (Commented) (JIRA) |
[jira] [Commented] (NUTCH-1234) Upgrade to Tika 1.1 |
Mon, 02 Apr, 14:07 |
Markus Jelsma (Created) (JIRA) |
[jira] [Created] (NUTCH-1323) AjaxNormalizer |
Mon, 02 Apr, 20:05 |
Markus Jelsma (Created) (JIRA) |
[jira] [Created] (NUTCH-1324) DupeDB for Nutch |
Mon, 02 Apr, 20:07 |
Markus Jelsma (Created) (JIRA) |
[jira] [Created] (NUTCH-1325) HostDB for Nutch |
Mon, 02 Apr, 20:07 |
Markus Jelsma (Created) (JIRA) |
[jira] [Created] (NUTCH-1326) HostDeduplicator for Nutch |
Mon, 02 Apr, 20:11 |
Markus Jelsma (Created) (JIRA) |
[jira] [Created] (NUTCH-1327) QueryStringNormalizer |
Mon, 02 Apr, 20:57 |
Aamir Khan |
GSoC : Web page scraper plugin |
Tue, 03 Apr, 04:45 |
Aamir Khan |
GSoC : Web page scraper plugin |
Tue, 03 Apr, 04:50 |
Apache Jenkins Server |
Jenkins build is back to normal : nutch-trunk-maven #223 |
Tue, 03 Apr, 05:03 |
behnam nikbakht (Created) (JIRA) |
[jira] [Created] (NUTCH-1328) a problem with regex-normalize.xml |
Tue, 03 Apr, 05:12 |
Markus Jelsma |
Re: NutchGora release, and Nutch 1.x trunk release |
Tue, 03 Apr, 10:29 |
Lewis John Mcgibbney |
Re: NutchGora release, and Nutch 1.x trunk release |
Tue, 03 Apr, 10:30 |
Lewis John Mcgibbney |
Re: GSoC : Web page scraper plugin |
Tue, 03 Apr, 11:01 |
Aamir Khan |
Re: GSoC : Web page scraper plugin |
Tue, 03 Apr, 11:05 |
Lewis John Mcgibbney |
Re: GSoC : Web page scraper plugin |
Tue, 03 Apr, 11:15 |
Julien Nioche |
Re: NutchGora release, and Nutch 1.x trunk release |
Tue, 03 Apr, 11:22 |
Markus Jelsma (Resolved) (JIRA) |
[jira] [Resolved] (NUTCH-1225) Migrate CrawlDBScanner to MapReduce API |
Tue, 03 Apr, 11:26 |
Markus Jelsma (Resolved) (JIRA) |
[jira] [Resolved] (NUTCH-1222) Upgrade to new Hadoop 0.22.0 |
Tue, 03 Apr, 11:26 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1318) Parse time outs crash parsing fetcher |
Tue, 03 Apr, 11:32 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1245) URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again |
Tue, 03 Apr, 11:32 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-717) Make Nutch Solr integration easier |
Tue, 03 Apr, 11:32 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1251) Deletion of duplicates fails with org.apache.solr.client.solrj.SolrServerException |
Tue, 03 Apr, 11:32 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1129) Any23 Nutch plugin |
Tue, 03 Apr, 11:32 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1219) Upgrade all jobs to new MapReduce API |
Tue, 03 Apr, 11:32 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-578) URL fetched with 403 is generated over and over again |
Tue, 03 Apr, 11:32 |
Aamir Khan |
Re: GSoC : Web page scraper plugin |
Tue, 03 Apr, 11:54 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement |
Tue, 03 Apr, 11:58 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1273) Fix [deprecation] javac warnings |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1147) WebGraph nodeDumper uses only 1 reducer |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1150) http.redirect.max can lead to multiple parses of the same url |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1116) Write JUnit tests for all plugins |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index? |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1084) ReadDB url throws exception |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1194) CrawlDB lock should be released earlier |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1119) JUnit test for index-static |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1201) Allow for different FetcherThread impls |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1274) Fix [cast] javac warnings |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1176) Fix all javadoc warnings from nightly builds |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1183) Summary task for adding command line usage instructions to webgraph classes |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1040) Backport REST-API from 2.0 |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1262) Map `duplicating` content-types to a single type |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1233) Rely on Tika for outlink extraction |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1014) Migrate from Apache ORO to java.util.regex |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1220) Upgrade Solr deps |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1063) OutlinkExtractor test generates an exception but does not fail |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1123) JUnit test for scoring-link |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1224) Migrate FreeGenerator to MapReduce API |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1252) SegmentReader -get shows wrong data |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-809) Parse-metatags plugin |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1122) JUnit test for protocol-ftp |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1121) JUnit test for parse-js |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1120) JUnit test for microformats-reltag |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-865) Format source code in unique style |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1197) Add statically configured field values to solrindex-mapping.xml |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1060) URL filters to produce regexes to be used by OutlinkExtractor. |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1046) Add tests for indexing to SOLR |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1186) FreeGenerator always normalizes |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1124) JUnit test for scoring-opic |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1001) bin/nutch fetch/parse handle crawl/segments directory |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1100) SolrDedup broken |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1228) Change mapred.task.timeout to mapreduce.task.timeout in fetcher |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1127) JUnit test for urlfilter-validator |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1179) Option to restrict generated records by metadata |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1247) CrawlDatum.retries should be int |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1143) Omit anchor in webgraph's LinkDatum |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1062) Migrate BasicURLNormalizer from Apache ORO to java.util.regex |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1107) Log slow parse entries |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1126) JUnit test for urlfilter-prefix |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1125) JUnit test for tld |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-208) http: proxy exception list: |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1047) Pluggable indexing backends |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1035) Tune Solr config for Nutch users |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1223) Migrate WebGraph to MapReduce API |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1130) JUnit test for Any23 RDF plugin |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1128) JUnit test for urlmeta |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1320) IndexChecker and ParseChecker choke on IDN's |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1300) Indexer to normalize URL's |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1034) Create Solr Velocity templates |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1319) HostNormalizer |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1226) Migrate CrawlDbReader to MapReduce API |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1140) index-more plugin, resetTitle method creates multiple values in the Title field |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-827) HTTP POST Authentication |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1021) Migrate OutlinkExtractor from Apache ORO to java.util.regex |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1202) Fetcher timebomb kills long waiting fetch jobs |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1151) Index-anchor to add numInlinks count |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1079) StringBuffer converted to StringBuilder |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1275) Fix [unchecked] javac warnings |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1218) Improve trunk API documentation |
Tue, 03 Apr, 12:08 |
Markus Jelsma (Updated) (JIRA) |
[jira] [Updated] (NUTCH-1039) Fetcher fails for pages without content-length header |
Tue, 03 Apr, 12:08 |