Jérôme Charron |
Re: Content-type detection for Tika |
Wed, 06 Sep, 12:46 |
Uroš Gruber |
Re: [jira] Commented: (NUTCH-361) generator create fetchlist randomly |
Sun, 03 Sep, 09:01 |
Uroš Gruber |
[Fwd: Re: get CrawlDatum] |
Wed, 06 Sep, 17:43 |
Doğacan Güney (JIRA) |
[jira] Commented: (NUTCH-339) Refactor nutch to allow fetcher improvements |
Fri, 08 Sep, 08:17 |
Doğacan Güney (JIRA) |
[jira] Updated: (NUTCH-339) Refactor nutch to allow fetcher improvements |
Fri, 08 Sep, 08:17 |
Uroš Gruber |
Re: [jira] Commented: (NUTCH-249) black- white list url filtering |
Tue, 05 Sep, 09:44 |
Uroš Gruber |
Re: [Fwd: Re: get CrawlDatum] |
Thu, 07 Sep, 05:53 |
AJ Chen |
log error in deploying nutch-0.9-dev.jar |
Thu, 07 Sep, 16:16 |
AJ Chen |
Re: log error in deploying nutch-0.9-dev.jar |
Thu, 07 Sep, 16:30 |
Andrzej Bialecki |
Re: 0.8.1 |
Wed, 06 Sep, 15:56 |
Andrzej Bialecki |
Re: [Fwd: Re: get CrawlDatum] |
Wed, 06 Sep, 19:31 |
Andrzej Bialecki |
Re: Searching on fields with uppercase letters |
Tue, 26 Sep, 14:18 |
Andrzej Bialecki |
Re: svn commit: r451649 - /lucene/nutch/trunk/CHANGES.txt |
Sat, 30 Sep, 21:13 |
Andrzej Bialecki |
Re: svn commit: r451649 - /lucene/nutch/trunk/CHANGES.txt |
Sat, 30 Sep, 21:51 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-361) generator create fetchlist randomly |
Wed, 06 Sep, 17:04 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-339) Refactor nutch to allow fetcher improvements |
Thu, 07 Sep, 18:37 |
Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-365) Flexible URL normalization |
Sat, 09 Sep, 13:22 |
Andrzej Bialecki (JIRA) |
[jira] Assigned: (NUTCH-365) Flexible URL normalization |
Sat, 09 Sep, 13:22 |
Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-365) Flexible URL normalization |
Sat, 09 Sep, 13:24 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-365) Flexible URL normalization |
Mon, 11 Sep, 16:34 |
Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-366) Merge URLFilters and URLNormalizers |
Tue, 12 Sep, 14:29 |
Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-368) Message queueing system |
Fri, 15 Sep, 20:37 |
Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-368) Message queueing system |
Fri, 15 Sep, 20:39 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-365) Flexible URL normalization |
Sat, 16 Sep, 07:08 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-368) Message queueing system |
Mon, 18 Sep, 18:43 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-368) Message queueing system |
Tue, 19 Sep, 07:18 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-365) Flexible URL normalization |
Fri, 22 Sep, 21:02 |
Andrzej Bialecki (JIRA) |
[jira] Assigned: (NUTCH-332) doubling score causes by page internal anchors. |
Fri, 22 Sep, 21:04 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-332) doubling score causes by page internal anchors. |
Fri, 22 Sep, 21:46 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-336) Harvested links shouldn't get db.score.injected in addition to inbound contributions |
Sat, 23 Sep, 17:28 |
Andrzej Bialecki (JIRA) |
[jira] Commented: (NUTCH-353) pages that serverside forwards will be refetched every time |
Sat, 23 Sep, 17:43 |
Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-353) pages that serverside forwards will be refetched every time |
Sat, 23 Sep, 17:50 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-337) Fetcher ignores the fetcher.parse value configured in config file |
Sat, 23 Sep, 18:57 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-253) Normalize Host during Generate |
Sat, 23 Sep, 19:01 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-350) urls blocked db.fetch.retry.max * http.max.delays times during fetching are marked as STATUS_DB_GONE |
Sat, 23 Sep, 19:45 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-276) db.score.link.internal problem |
Sat, 23 Sep, 19:47 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-205) Wrong 'fetch date' for non available pages |
Sat, 23 Sep, 19:51 |
Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-372) Fetcher halting and throttling |
Tue, 26 Sep, 09:47 |
Andrzej Bialecki (JIRA) |
[jira] Created: (NUTCH-373) Fetcher halting and throttling |
Tue, 26 Sep, 09:49 |
Andrzej Bialecki (JIRA) |
[jira] Updated: (NUTCH-368) Message queueing system |
Tue, 26 Sep, 09:53 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-372) Fetcher halting and throttling |
Tue, 26 Sep, 09:57 |
Andrzej Bialecki (JIRA) |
[jira] Closed: (NUTCH-373) Fetcher halting and throttling |
Tue, 26 Sep, 09:57 |
Ben Ogle |
File system watching for intranets |
Tue, 12 Sep, 18:04 |
Ben Ogle |
Re: File system watching for intranets |
Wed, 13 Sep, 20:53 |
Bradley Parker |
wavering again and then the hell the earth which bodies are |
Sat, 30 Sep, 14:40 |
Chris Schneider (JIRA) |
[jira] Created: (NUTCH-371) DeleteDuplicates should remove documents with duplicate URLs |
Mon, 25 Sep, 16:14 |
Chris Schneider (JIRA) |
[jira] Updated: (NUTCH-371) DeleteDuplicates should remove documents with duplicate URLs |
Mon, 25 Sep, 16:16 |
Chris Schneider (JIRA) |
[jira] Commented: (NUTCH-351) Protocol forward proxy |
Wed, 27 Sep, 02:52 |
David Podunavac |
several url to search for [multiple url] |
Mon, 04 Sep, 13:43 |
David Podunavac |
Re: Ontology compile bug |
Fri, 08 Sep, 06:31 |
David Podunavac (JIRA) |
[jira] Updated: (NUTCH-358) Language Switching |
Fri, 01 Sep, 07:49 |
David Podunavac (JIRA) |
[jira] Updated: (NUTCH-358) Language Switching PROBLEM FIXED |
Fri, 01 Sep, 07:51 |
Doug Cook (JIRA) |
[jira] Created: (NUTCH-363) Fetcher normalizes everything at least twice |
Fri, 08 Sep, 18:47 |
Doug Cook (JIRA) |
[jira] Created: (NUTCH-364) Javascript parser creates some fairly bogus URLs |
Sat, 09 Sep, 00:23 |
Doug Cook (JIRA) |
[jira] Commented: (NUTCH-365) Flexible URL normalization |
Sat, 09 Sep, 15:44 |
Doug Cook (JIRA) |
[jira] Commented: (NUTCH-365) Flexible URL normalization |
Sat, 09 Sep, 16:00 |
Doug Cook (JIRA) |
[jira] Commented: (NUTCH-365) Flexible URL normalization |
Mon, 18 Sep, 10:03 |
Doug Cook (JIRA) |
[jira] Commented: (NUTCH-364) Javascript parser creates some fairly bogus URLs |
Tue, 19 Sep, 17:37 |
Doug Cutting (JIRA) |
[jira] Commented: (NUTCH-368) Message queueing system |
Mon, 18 Sep, 17:32 |
Enrico Triolo |
Searching on fields with uppercase letters |
Tue, 26 Sep, 14:10 |
Enrico Triolo |
Re: Searching on fields with uppercase letters |
Tue, 26 Sep, 14:33 |
Federico Dal Maso |
Re: [jira] Created: (NUTCH-366) Merge URLFilters and URLNormalizers |
Tue, 12 Sep, 18:08 |
Howie Wang |
RE: ask a problem about nutch (from China) |
Fri, 15 Sep, 20:40 |
Jane Zhen |
Time of Reading Local Files |
Mon, 18 Sep, 12:55 |
Jp Mutch |
Which tutorial to use for getting Nutch 9.12 up and running on a single machine? |
Mon, 18 Sep, 17:48 |
Jp Mutch |
Ant tasks/build.xml file for running Nutch in debug mode? |
Fri, 22 Sep, 05:05 |
Jukka Zitting |
Content-type detection for Tika |
Wed, 06 Sep, 09:36 |
Kim, Greg |
RE: 0.8.1 |
Wed, 06 Sep, 17:50 |
Kim, Greg |
CrawlDatum.modifiedTime ? |
Tue, 19 Sep, 21:22 |
King Kong (JIRA) |
[jira] Commented: (NUTCH-353) pages that serverside forwards will be refetched every time |
Fri, 15 Sep, 04:18 |
King Kong (JIRA) |
[jira] Created: (NUTCH-369) StringUtil.resolveEncodingAlias is unuseful. |
Mon, 18 Sep, 10:24 |
King Kong (JIRA) |
[jira] Created: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. |
Wed, 27 Sep, 17:32 |
King Kong (JIRA) |
[jira] Commented: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. |
Sat, 30 Sep, 04:53 |
Marcel Petrisor |
Modifications necessary to upgrade to Hadoop 0.6.2 |
Mon, 25 Sep, 15:49 |
Meghna Kukreja (JIRA) |
[jira] Commented: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. |
Fri, 29 Sep, 14:31 |
Michael Wechner |
Re: Ontology compile bug |
Fri, 08 Sep, 07:26 |
Michael Wechner |
Re: File system watching for intranets |
Wed, 13 Sep, 07:51 |
Neal Richter |
Re: Should URL normalization iterate? |
Mon, 04 Sep, 18:51 |
Otis Gospodnetic (JIRA) |
[jira] Commented: (NUTCH-359) extraction of links will fail for whole page if one single link cannot be parsed |
Fri, 08 Sep, 04:56 |
Piotr Kosiorowski |
Re: Patch Available status? |
Fri, 01 Sep, 07:19 |
Piotr Kosiorowski |
Re: svn commit: r451649 - /lucene/nutch/trunk/CHANGES.txt |
Sat, 30 Sep, 20:03 |
Piotr Kosiorowski (JIRA) |
[jira] Assigned: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. |
Sat, 30 Sep, 19:31 |
Piotr Kosiorowski (JIRA) |
[jira] Resolved: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. |
Sat, 30 Sep, 19:39 |
Richard Braman |
problem with hadoop |
Tue, 05 Sep, 23:11 |
Richard Braman |
RE: problem with hadoop |
Wed, 06 Sep, 00:37 |
Richard Braman |
RE: problem with hadoop |
Wed, 06 Sep, 01:42 |
Richard Braman |
Re: Which tutorial to use for getting Nutch 9.12 up and running on a single machine? |
Wed, 20 Sep, 03:34 |
Richard Braman (JIRA) |
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb |
Wed, 06 Sep, 01:13 |
Richard Braman (JIRA) |
[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb |
Wed, 06 Sep, 22:25 |
Sami Siren |
Re: indexing problem |
Wed, 06 Sep, 15:45 |
Sami Siren |
0.8.1 |
Wed, 06 Sep, 15:49 |
Sami Siren |
Re: 0.8.1 |
Wed, 06 Sep, 15:59 |
Sami Siren |
Re: [jira] Commented: (NUTCH-368) Message queueing system |
Tue, 19 Sep, 18:13 |
Sami Siren |
Re: 0.8.1 |
Sun, 24 Sep, 07:27 |
Sami Siren |
Re: svn commit: r451649 - /lucene/nutch/trunk/CHANGES.txt |
Sat, 30 Sep, 19:46 |
Sami Siren |
Re: svn commit: r451649 - /lucene/nutch/trunk/CHANGES.txt |
Sat, 30 Sep, 20:18 |
Sami Siren |
Re: svn commit: r451649 - /lucene/nutch/trunk/CHANGES.txt |
Sat, 30 Sep, 21:30 |
Sami Siren (JIRA) |
[jira] Created: (NUTCH-360) Switch nutch to use java 5 source format |
Fri, 01 Sep, 15:19 |
Sami Siren (JIRA) |
[jira] Resolved: (NUTCH-360) Switch nutch to use java 5 source format |
Sat, 02 Sep, 05:15 |
Sami Siren (JIRA) |
[jira] Commented: (NUTCH-361) generator create fetchlist randomly |
Sun, 03 Sep, 04:39 |