Jérôme Charron |
Error with Hadoop-0.4.0 |
Thu, 06 Jul, 15:54 |
Jérôme Charron |
Re: Error with Hadoop-0.4.0 |
Thu, 06 Jul, 21:48 |
Jérôme Charron |
Re: Error with Hadoop-0.4.0 |
Fri, 07 Jul, 23:08 |
Lourival Júnior |
Number of pages different to Indexed documents |
Fri, 07 Jul, 17:02 |
Lourival Júnior |
Re: How can i get a page content or parse data by the page's url |
Tue, 25 Jul, 17:40 |
Lourival Júnior |
Re: How can i get a page content or parse data by the page's url |
Wed, 26 Jul, 11:23 |
Uygar Yüzsüren |
neko parser or tagsoup parser? |
Mon, 03 Jul, 07:27 |
AJ Chen |
Crawl error |
Mon, 10 Jul, 04:47 |
AJ Chen |
Re: [Nutch-dev] Crawl error |
Mon, 10 Jul, 07:05 |
Aaron Tang |
How can i get a page content or parse data by the page's url |
Tue, 25 Jul, 16:36 |
Aaron Tang |
RE: How can i get a page content or parse data by the page's url |
Wed, 26 Jul, 01:45 |
Andrzej Bialecki |
Re: Error with Hadoop-0.4.0 |
Mon, 10 Jul, 07:37 |
Andrzej Bialecki |
Re: Error with Hadoop-0.4.0 |
Mon, 10 Jul, 07:42 |
Andrzej Bialecki |
Re: db.max.inlinks |
Tue, 18 Jul, 23:02 |
Andrzej Bialecki |
Re: db.max.inlinks |
Tue, 18 Jul, 23:19 |
Andrzej Bialecki |
Re: [jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt |
Wed, 19 Jul, 21:32 |
Andrzej Bialecki |
Re: nutch-extensionpoints not in plugin.includes |
Thu, 20 Jul, 20:56 |
Andrzej Bialecki |
Re: Changing javac.version to 1.5? |
Sat, 22 Jul, 18:45 |
Andrzej Bialecki |
Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb |
Mon, 24 Jul, 07:40 |
Andrzej Bialecki |
Re: segread vs. readseg |
Mon, 24 Jul, 20:01 |
Andrzej Bialecki |
Re: segread vs. readseg |
Mon, 24 Jul, 23:10 |
Andrzej Bialecki |
Re: Why was "prune" removed in 0.8? |
Tue, 25 Jul, 00:03 |
Andrzej Bialecki |
Re: 0.8 release |
Tue, 25 Jul, 09:35 |
Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-321) Scoring API deficiency |
Mon, 17 Jul, 13:53 |
Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-321) Scoring API deficiency |
Mon, 17 Jul, 14:08 |
Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Wed, 19 Jul, 12:11 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links ) |
Wed, 19 Jul, 17:34 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-271) Meta-data per URL/site/section |
Wed, 19 Jul, 18:22 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt |
Wed, 19 Jul, 20:53 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-293) support for Crawl-delay in Robots.txt |
Wed, 19 Jul, 22:06 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it. |
Wed, 19 Jul, 22:33 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-321) Scoring API deficiency |
Wed, 19 Jul, 22:42 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Thu, 20 Jul, 10:06 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Thu, 20 Jul, 22:08 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-329) CrawlDbReader processTopNJob does not set jobNames |
Mon, 24 Jul, 08:39 |
Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-167) Observation of <META NAME="ROBOTS" CONTENT="NOARCHIVE"> directive |
Mon, 24 Jul, 15:08 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored |
Mon, 24 Jul, 15:26 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Mon, 24 Jul, 23:04 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-318) log4j not proper configured, readdb doesnt give any information |
Wed, 26 Jul, 06:37 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-318) log4j not proper configured, readdb doesnt give any information |
Wed, 26 Jul, 06:58 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-318) log4j not proper configured, readdb doesnt give any information |
Wed, 26 Jul, 09:18 |
Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-331) Fetcher incorrectly reports task progress to tasktracker resulting in skipped URLs |
Thu, 27 Jul, 11:12 |
Brian M.B. Keaney |
Webcrawler |
Wed, 19 Jul, 23:20 |
Chris A. Mattmann (JIRA) |
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Mon, 24 Jul, 04:33 |
Chris A. Mattmann (JIRA) |
[jira] Updated: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Tue, 25 Jul, 21:56 |
Chris Mattmann |
RE: Library for extracting text content from binaries |
Mon, 24 Jul, 18:38 |
Chris Stephens |
error in recommended plugin example |
Wed, 19 Jul, 17:24 |
Chris Stephens |
multiple query filters |
Fri, 21 Jul, 16:08 |
Dawid Weiss (JIRA) |
[jira] Commented: (NUTCH-300) Clustering API improvements |
Fri, 07 Jul, 13:17 |
Doug Cutting |
Re: 0.8 release |
Wed, 05 Jul, 10:46 |
Doug Cutting |
Re: Error with Hadoop-0.4.0 |
Mon, 10 Jul, 08:11 |
Doug Cutting |
Re: Error with Hadoop-0.4.0 |
Wed, 12 Jul, 08:17 |
Doug Cutting (JIRA) |
[jira] Reopened: (NUTCH-309) Uses commons logging Code Guards |
Fri, 07 Jul, 08:59 |
Enrico Triolo |
Re: Possible memory leak? |
Thu, 13 Jul, 11:29 |
Enrico Triolo (JIRA) |
[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Thu, 20 Jul, 09:50 |
Enrico Triolo (JIRA) |
[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Thu, 20 Jul, 12:31 |
Enrico Triolo (JIRA) |
[jira] Commented: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages |
Mon, 24 Jul, 08:39 |
Gal Nitzan |
RE: Error with Hadoop-0.4.0 |
Mon, 10 Jul, 09:06 |
Greg Kim |
Changing javac.version to 1.5? |
Fri, 21 Jul, 19:44 |
Jack Tang |
Distributed Matrix Computering on Hadoop |
Fri, 21 Jul, 09:24 |
Jerome Charron (JIRA) |
[jira] Resolved: (NUTCH-317) Clarify what the queryLanguage argument of Query.parse(...) means |
Thu, 06 Jul, 16:49 |
Jerome Charron (JIRA) |
[jira] Commented: (NUTCH-309) Uses commons logging Code Guards |
Fri, 07 Jul, 09:27 |
Jukka Zitting |
Library for extracting text content from binaries |
Mon, 17 Jul, 21:59 |
Jukka Zitting |
Re: Library for extracting text content from binaries |
Mon, 24 Jul, 18:28 |
Jukka Zitting |
Re: Library for extracting text content from binaries |
Tue, 25 Jul, 06:54 |
Ken Krugler |
Re: result comparison tool? |
Mon, 24 Jul, 02:53 |
Kerry Wilson |
Windows BAT |
Mon, 17 Jul, 14:18 |
Mark Wilkerson |
Opportunities at Oracle Corporation - Oracle Enterprise Search |
Tue, 11 Jul, 05:42 |
Michael Wechner |
Re: Library for extracting text content from binaries |
Mon, 24 Jul, 21:09 |
Piotr Kosiorowski |
Re: Nutch web site |
Tue, 04 Jul, 15:55 |
Piotr Kosiorowski |
Re: 0.8 release |
Tue, 04 Jul, 15:56 |
Piotr Kosiorowski |
Re: log when blocked by robots.txt |
Fri, 21 Jul, 06:50 |
Piotr Kosiorowski |
Re: 0.8 release |
Thu, 27 Jul, 08:24 |
Renaud Richardet (JIRA) |
[jira] Created: (NUTCH-330) command line tool to search a Lucene index |
Tue, 25 Jul, 20:20 |
Renaud Richardet (JIRA) |
[jira] Updated: (NUTCH-330) command line tool to search a Lucene index |
Tue, 25 Jul, 20:20 |
Renaud Richardet (JIRA) |
[jira] Updated: (NUTCH-330) command line tool to search a Lucene index |
Tue, 25 Jul, 20:26 |
Renaud Richardet (JIRA) |
[jira] Updated: (NUTCH-208) http: proxy exception list: |
Mon, 31 Jul, 21:25 |
Robert Sanford |
Scanning the database |
Tue, 25 Jul, 15:10 |
Robert Sanford |
Indexing href attribute in links. |
Tue, 25 Jul, 15:11 |
Robert Sanford |
Limiting Results By Domain |
Tue, 25 Jul, 17:58 |
Sami Siren |
Re: Error with Hadoop-0.4.0 |
Thu, 06 Jul, 17:23 |
Sami Siren |
Re: Error with Hadoop-0.4.0 |
Wed, 12 Jul, 06:31 |
Sami Siren |
Re: Error with Hadoop-0.4.0 |
Wed, 12 Jul, 07:06 |
Sami Siren |
Re: Possible memory leak? |
Thu, 13 Jul, 11:35 |
Sami Siren |
Re: Possible problem in WebAppModule |
Mon, 17 Jul, 12:52 |
Sami Siren |
Re: [jira] Commented: (NUTCH-271) Meta-data per URL/site/section |
Wed, 19 Jul, 20:12 |
Sami Siren |
Re: [jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt |
Wed, 19 Jul, 21:22 |
Sami Siren |
Re: 0.8 release |
Sat, 22 Jul, 20:15 |
Sami Siren |
tests failing |
Sun, 23 Jul, 20:27 |
Sami Siren |
Re: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb |
Mon, 24 Jul, 07:46 |
Sami Siren |
Re: 0.8 release |
Tue, 25 Jul, 09:15 |
Sami Siren |
Re: 0.8 release |
Wed, 26 Jul, 15:04 |
Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-172) Segment merger |
Tue, 11 Jul, 21:02 |
Sami Siren (JIRA) |
[jira] Created: (NUTCH-320) DmozParser does not output urls to stdout |
Mon, 17 Jul, 06:53 |
Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-320) DmozParser does not output urls to stdout |
Mon, 17 Jul, 06:55 |
Sami Siren (JIRA) |
[jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt |
Tue, 18 Jul, 19:51 |
Sami Siren (JIRA) |
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb |
Sun, 23 Jul, 18:22 |
Sami Siren (JIRA) |
[jira] Created: (NUTCH-327) bin/nutch setting of log path problems on cygwin |
Sun, 23 Jul, 18:30 |
Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-327) bin/nutch setting of log path problems on cygwin |
Sun, 23 Jul, 18:45 |
Sami Siren (JIRA) |
[jira] Created: (NUTCH-328) commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4 |
Sun, 23 Jul, 18:56 |