YourSoft |
Re: refetching interval |
Thu, 01 Jun, 10:02 |
YourSoft |
webgraph |
Thu, 01 Jun, 10:02 |
Stefan Groschupf (JIRA) |
[jira] Created: (NUTCH-293) support for Crawl-delay in Robots.txt |
Thu, 01 Jun, 17:24 |
Stefan Groschupf (JIRA) |
[jira] Updated: (NUTCH-293) support for Crawl-delay in Robots.txt |
Thu, 01 Jun, 17:26 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-289) CrawlDatum should store IP address |
Thu, 01 Jun, 18:41 |
Teruhiko Kurosaka |
how to turn on logging, excersize analyzer, tips on debugging plugins? |
Thu, 01 Jun, 20:01 |
Teruhiko Kurosaka |
i18n in nutch home page is misnomor |
Thu, 01 Jun, 21:54 |
Stefan Neufeind (JIRA) |
[jira] Created: (NUTCH-294) Topic-maps of related searchwords |
Fri, 02 Jun, 06:56 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-282) Showing too few results on a page (Paging not correct) |
Fri, 02 Jun, 15:08 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-286) Handling common error-pages as 404 |
Fri, 02 Jun, 15:13 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-288) hitsPerSite-functionality "flawed": problems writing a page-navigation |
Fri, 02 Jun, 15:20 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-292) OpenSearchServlet: OutOfMemoryError: Java heap space |
Fri, 02 Jun, 15:31 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-291) OpenSearchServlet should return "date" as well as "lastModified" |
Fri, 02 Jun, 15:39 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-290) parse-pdf: Garbage indexed when text-extraction not allowed |
Fri, 02 Jun, 15:45 |
Stefan Neufeind (JIRA) |
[jira] Updated: (NUTCH-292) OpenSearchServlet: OutOfMemoryError: Java heap space |
Fri, 02 Jun, 15:51 |
Stefan Groschupf (JIRA) |
[jira] Closed: (NUTCH-287) Exception when searching with sort |
Fri, 02 Jun, 15:55 |
Stefan Groschupf (JIRA) |
[jira] Closed: (NUTCH-284) NullPointerException during index |
Fri, 02 Jun, 15:57 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-284) NullPointerException during index |
Fri, 02 Jun, 15:59 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-281) cached.jsp: base-href needs to be outside comments |
Fri, 02 Jun, 15:59 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-275) Fetcher not parsing XHTML-pages at all |
Fri, 02 Jun, 16:07 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-274) Empty row in/at end of URL-list results in error |
Fri, 02 Jun, 16:13 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-290) parse-pdf: Garbage indexed when text-extraction not allowed |
Fri, 02 Jun, 16:13 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-282) Showing too few results on a page (Paging not correct) |
Fri, 02 Jun, 16:19 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-286) Handling common error-pages as 404 |
Fri, 02 Jun, 16:23 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-291) OpenSearchServlet should return "date" as well as "lastModified" |
Fri, 02 Jun, 16:25 |
Stefan Groschupf (JIRA) |
[jira] Updated: (NUTCH-274) Empty row in/at end of URL-list results in error |
Fri, 02 Jun, 16:25 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-290) parse-pdf: Garbage indexed when text-extraction not allowed |
Fri, 02 Jun, 16:32 |
Stefan Groschupf (JIRA) |
[jira] Resolved: (NUTCH-282) Showing too few results on a page (Paging not correct) |
Fri, 02 Jun, 16:34 |
Stefan Groschupf (JIRA) |
[jira] Closed: (NUTCH-286) Handling common error-pages as 404 |
Fri, 02 Jun, 16:36 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-275) Fetcher not parsing XHTML-pages at all |
Fri, 02 Jun, 16:46 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-290) parse-pdf: Garbage indexed when text-extraction not allowed |
Fri, 02 Jun, 16:54 |
Dennis Kubes (JIRA) |
[jira] Created: (NUTCH-295) More description for fetcher.threads.fetch property |
Fri, 02 Jun, 16:58 |
Dennis Kubes (JIRA) |
[jira] Updated: (NUTCH-295) More description for fetcher.threads.fetch property |
Fri, 02 Jun, 17:00 |
Thomas Delnoij (JIRA) |
[jira] Created: (NUTCH-296) Image Search |
Sat, 03 Jun, 16:53 |
Thomas Delnoij (JIRA) |
[jira] Updated: (NUTCH-296) Image Search |
Sat, 03 Jun, 17:05 |
Stefan Groschupf (JIRA) |
[jira] Created: (NUTCH-297) sandbox svn folder |
Sat, 03 Jun, 17:13 |
Chris A. Mattmann (JIRA) |
[jira] Commented: (NUTCH-294) Topic-maps of related searchwords |
Sat, 03 Jun, 17:59 |
Chris A. Mattmann (JIRA) |
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Sat, 03 Jun, 18:10 |
Chris A. Mattmann (JIRA) |
[jira] Assigned: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection |
Sat, 03 Jun, 18:16 |
Chris A. Mattmann (JIRA) |
[jira] Commented: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection |
Sat, 03 Jun, 18:18 |
Chris A. Mattmann (JIRA) |
[jira] Updated: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection |
Sat, 03 Jun, 18:18 |
Chris A. Mattmann (JIRA) |
[jira] Updated: (NUTCH-187) Cannot start Nutch datanodes on Windows outside of a cygwin environment because of DF |
Sat, 03 Jun, 18:44 |
Stefan Groschupf (JIRA) |
[jira] Created: (NUTCH-298) if a 404 for a robots.txt is returned no page is fetched at all from the host |
Sat, 03 Jun, 19:44 |
Stefan Groschupf (JIRA) |
[jira] Updated: (NUTCH-298) if a 404 for a robots.txt is returned no page is fetched at all from the host |
Sat, 03 Jun, 19:53 |
Stefan Groschupf |
RobotRuleSet |
Sat, 03 Jun, 19:58 |
Hasan Diwan (JIRA) |
[jira] Created: (NUTCH-299) Bittorrent Parser |
Sat, 03 Jun, 23:04 |
Hasan Diwan (JIRA) |
[jira] Updated: (NUTCH-299) Bittorrent Parser |
Sat, 03 Jun, 23:07 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-299) Bittorrent Parser |
Sun, 04 Jun, 14:15 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Sun, 04 Jun, 15:53 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-298) if a 404 for a robots.txt is returned no page is fetched at all from the host |
Sun, 04 Jun, 15:56 |
Hasan Diwan (JIRA) |
[jira] Commented: (NUTCH-299) Bittorrent Parser |
Sun, 04 Jun, 16:04 |
Stefan Groschupf (JIRA) |
[jira] Updated: (NUTCH-298) if a 404 for a robots.txt is returned a NPE is thrown |
Sun, 04 Jun, 16:27 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-294) Topic-maps of related searchwords |
Sun, 04 Jun, 17:09 |
Stefan Groschupf |
search engine spam detector |
Sun, 04 Jun, 17:14 |
Stefan Neufeind |
Re: search engine spam detector |
Sun, 04 Jun, 17:23 |
Chris A. Mattmann (JIRA) |
[jira] Resolved: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Sun, 04 Jun, 18:20 |
Chris A. Mattmann (JIRA) |
[jira] Closed: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Sun, 04 Jun, 18:22 |
Stefan Groschupf |
Re: search engine spam detector |
Sun, 04 Jun, 18:50 |
ogjunk-nu...@yahoo.com |
Re: [Nutch-cvs] svn commit: r411594 - /lucene/nutch/trunk/contrib/web2/plugins/build.xml |
Mon, 05 Jun, 05:33 |
sboo...@orbita1.ru |
RE: search engine spam detector |
Mon, 05 Jun, 09:01 |
an...@orbita1.ru |
summary |
Mon, 05 Jun, 09:43 |
Andrzej Bialecki |
Re: summary |
Mon, 05 Jun, 10:55 |
Sylvain FURMANEK |
RE: summary |
Mon, 05 Jun, 12:26 |
Uygar Yüzsüren |
parse OutOfMemoryError? |
Mon, 05 Jun, 12:35 |
Andrzej Bialecki |
Re: search engine spam detector |
Mon, 05 Jun, 13:12 |
Scott Ganyo (JIRA) |
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Mon, 05 Jun, 14:00 |
Stefan Groschupf (JIRA) |
[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Mon, 05 Jun, 14:11 |
Sami Siren |
Re: [Nutch-cvs] svn commit: r411594 - /lucene/nutch/trunk/contrib/web2/plugins/build.xml |
Mon, 05 Jun, 14:48 |
Andrzej Bialecki |
Re: [Nutch-cvs] svn commit: r411594 - /lucene/nutch/trunk/contrib/web2/plugins/build.xml |
Mon, 05 Jun, 15:00 |
Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-300) Clustering API improvements |
Mon, 05 Jun, 15:20 |
Chris Mattmann |
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Mon, 05 Jun, 15:20 |
Sami Siren |
Re: [Nutch-cvs] svn commit: r411594 - /lucene/nutch/trunk/contrib/web2/plugins/build.xml |
Mon, 05 Jun, 15:49 |
Stefan Groschupf (JIRA) |
[jira] Updated: (NUTCH-289) CrawlDatum should store IP address |
Mon, 05 Jun, 15:53 |
Stefan Groschupf |
Re: [Nutch-cvs] svn commit: r411594 - /lucene/nutch/trunk/contrib/web2/plugins/build.xml |
Mon, 05 Jun, 15:54 |
Andrzej Bialecki |
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Mon, 05 Jun, 16:50 |
Chris Mattmann |
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Mon, 05 Jun, 17:01 |
Andrzej Bialecki |
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Mon, 05 Jun, 17:34 |
Stefan Groschupf |
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Mon, 05 Jun, 17:47 |
Scott Ganyo |
Re: [jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Mon, 05 Jun, 17:53 |
Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-300) Clustering API improvements |
Mon, 05 Jun, 18:23 |
Chris A. Mattmann (JIRA) |
[jira] Reopened: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore |
Mon, 05 Jun, 18:40 |
Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-201) add support for subcollections |
Mon, 05 Jun, 20:14 |
Jerome Charron (JIRA) |
[jira] Resolved: (NUTCH-298) if a 404 for a robots.txt is returned a NPE is thrown |
Mon, 05 Jun, 21:48 |
an...@orbita1.ru |
RE: summary |
Tue, 06 Jun, 05:21 |
Jérôme Charron |
Re: svn commit: r411943 - in /lucene/nutch/trunk/lib: commons-logging-1.0.4.jar hadoop-0.2.1.jar hadoop-0.3.1.jar log4j-1.2.13.jar |
Tue, 06 Jun, 09:02 |
Stefan Groschupf |
Re: svn commit: r411943 - in /lucene/nutch/trunk/lib: commons-logging-1.0.4.jar hadoop-0.2.1.jar hadoop-0.3.1.jar log4j-1.2.13.jar |
Tue, 06 Jun, 10:10 |
Dawid Weiss (JIRA) |
[jira] Commented: (NUTCH-294) Topic-maps of related searchwords |
Tue, 06 Jun, 14:25 |
Stefan Neufeind (JIRA) |
[jira] Commented: (NUTCH-294) Topic-maps of related searchwords |
Tue, 06 Jun, 14:32 |
Jérôme Charron |
Re: svn commit: r411943 - in /lucene/nutch/trunk/lib: commons-logging-1.0.4.jar hadoop-0.2.1.jar hadoop-0.3.1.jar log4j-1.2.13.jar |
Tue, 06 Jun, 15:04 |
Doug Cutting |
Re: svn commit: r411943 - in /lucene/nutch/trunk/lib: commons-logging-1.0.4.jar hadoop-0.2.1.jar hadoop-0.3.1.jar log4j-1.2.13.jar |
Tue, 06 Jun, 17:09 |
Sami Siren (JIRA) |
[jira] Commented: (NUTCH-48) "Did you mean" query enhancement/refignment feature request |
Tue, 06 Jun, 19:04 |
Sami Siren |
Re: Nutch web site |
Tue, 06 Jun, 19:17 |
Björn Wilmsmann |
wildcard / regular expression searches |
Tue, 06 Jun, 22:12 |
Andrzej Bialecki |
Re: [Nutch-cvs] svn commit: r411594 - /lucene/nutch/trunk/contrib/web2/plugins/build.xml |
Tue, 06 Jun, 23:20 |
Stefan Groschupf |
classloading problem hadoop .3.1 |
Tue, 06 Jun, 23:41 |
Chris Schneider (JIRA) |
[jira] Created: (NUTCH-301) CommonGrams loads analysis.common.terms.file for each query |
Wed, 07 Jun, 02:50 |
Dawid Weiss (JIRA) |
[jira] Commented: (NUTCH-294) Topic-maps of related searchwords |
Wed, 07 Jun, 07:35 |
Jerome Charron (JIRA) |
[jira] Commented: (NUTCH-301) CommonGrams loads analysis.common.terms.file for each query |
Wed, 07 Jun, 08:29 |
Jérôme Charron |
Re: Status of language plugin |
Wed, 07 Jun, 08:58 |
Jerome Charron (JIRA) |
[jira] Commented: (NUTCH-275) Fetcher not parsing XHTML-pages at all |
Wed, 07 Jun, 11:49 |