nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vu Hoang (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (NUTCH-780) Nutch crawler did not read configuration files
Date Wed, 27 Jan 2010 02:56:34 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803193#action_12803193
] 

Vu Hoang edited comment on NUTCH-780 at 1/27/10 2:55 AM:
---------------------------------------------------------

add method
{code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid}
public static Configuration overwrite(Configuration nutchConfig)
{
	  Configuration crawlConfig = NutchConfiguration.createCrawlConfiguration();
	  Iterator<Entry<String, String>> entries = nutchConfig.iterator();
	  while (entries.hasNext())
	  {
		  Entry<String, String> entry = (Entry<String, String>) entries.next();
		  crawlConfig.set(entry.getKey(), entry.getValue());
	  }
	  
	  return crawlConfig;
}
{code}

add lines below into class org.apache.nutch.crawl.Crawl
{code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid}
public static Configuration nutchConfig = null;
public static void setNutchConfig(Configuration config) { nutchConfig = config; }
{code}

and re-configure nutch configuration inside of method main as below
{code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid}
Configuration conf = null;
if (nutchConfig != null) conf = overwrite(nutchConfig);
else conf = NutchConfiguration.createCrawlConfiguration();
{code}

I recommend that solution :)

      was (Author: vushogerts):
    add method
{code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid}
public static Configuration overwrite(Configuration nutchConfig)
  {
	  Configuration crawlConfig = NutchConfiguration.createCrawlConfiguration();
	  Iterator<Entry<String, String>> entries = nutchConfig.iterator();
	  while (entries.hasNext())
	  {
		  Entry<String, String> entry = (Entry<String, String>) entries.next();
		  crawlConfig.set(entry.getKey(), entry.getValue());
	  }
	  
	  return crawlConfig;
  }
{code}

add lines below into class org.apache.nutch.crawl.Crawl
{code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid}
public static Configuration nutchConfig = null;
public static void setNutchConfig(Configuration config) { nutchConfig = config; }
{code}

and re-configure nutch configuration inside of method main as below
{code:java|title=org/apache/nutch/crawl/Crawl.java|borderStyle=solid}
Configuration conf = null;
if (nutchConfig != null) conf = nutchConfig;
else conf = NutchConfiguration.createCrawlConfiguration();
{code}

I recommend that solution :)
  
> Nutch crawler did not read configuration files
> ----------------------------------------------
>
>                 Key: NUTCH-780
>                 URL: https://issues.apache.org/jira/browse/NUTCH-780
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Vu Hoang
>
> Nutch searcher can read properties at the constructor ...
> {code:java|title=NutchSearcher.java|borderStyle=solid}
> NutchBean bean = new NutchBean(getFilesystem().getConf(), fs);
> ... // put search engine code here
> {code}
> ... but Nutch crawler is not, it only reads data from arguments.
> {code:java|title=NutchCrawler.java|borderStyle=solid}
> StringBuilder builder = new StringBuilder();
> builder.append(domainlist + SPACE);
> builder.append(ARGUMENT_CRAWL_DIR);
> builder.append(domainlist + SUBFIX_CRAWLED + SPACE);
> builder.append(ARGUMENT_CRAWL_THREADS);
> builder.append(threads + SPACE);
> builder.append(ARGUMENT_CRAWL_DEPTH);
> builder.append(depth + SPACE);
> builder.append(ARGUMENT_CRAWL_TOPN);
> builder.append(topN + SPACE);
> Crawl.main(builder.toString().split(SPACE));
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message