nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aisha <aichaso...@yahoo.com>
Subject Re: Fetcher freezes
Date Mon, 06 Nov 2006 14:42:40 GMT

Hi,

I don't know if I well understood the "no regular expression filter" but I
delete the urlfilter from my nutch-site.xml,

this is my nutch-site.xml configuration :

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>

<property>
<name>plugin.includes</name> 

<value>protocol-file|parse-(text|msword|msexcel|mspowerpoint|rtf|xml|html|js|pdf|oo)|index-basic|query-basic|summary-basic|scoring-opic</value>
</property> 

<property>
  <name>file.content.ignored</name>
  <value>false</value>
</property>

<property>
<name>file.content.limit</name> <value>-1</value>
</property> 

<property>
  <name>db.ignore.external.links</name>
  <value>true</value>
</property>

<property>
  <name>fetcher.threads.fetch</name>
  <value>1000</value>
</property>

<property>
  <name>fetcher.threads.per.host</name>
  <value>1000</value>
  <description>This number is the maximum number of threads that
    should be allowed to access a host at one time.</description>
</property>

<property>
  <name>fetcher.verbose</name>
  <value>true</value>
  <description>If true, fetcher will log more verbosely.</description>
</property>
<property>
  <name>fetcher.server.delay</name>
  <value>5.0</value>
  <description>The number of seconds the fetcher will delay between 
   successive requests to the same server.</description>
</property>

<property>
 <name>fetcher.max.crawl.delay</name>
 <value>30</value>
</property> 

<property>
  <name>indexer.max.tokens</name>
  <value>Integer.MAX_VALUE</value>
</property>


<property>
  <name>db.max.outlinks.per.page</name>
  <value>10000</value>
</property>
<property>
  <name>db.max.anchor.length</name>
  <value>200</value>
  <description>The maximum number of characters permitted in an anchor.
  </description>
</property>
</configuration>


the fetcher freezes after 2 hours.....
as I said the logs don't give informations because each time I run it, the
freezes never occur on the same directory or file .....
Do I have to make a change in my configuration?

Thanks in advance,
Aïcha


Stefan Groschupf-2 wrote:
> 
> Hi,
> 
> try to have no regular expression filter and check if this helps.
> Let me know if this solve the problem.
> You may be want to do a thread dump and send the log to the list to  
> check where exactly the fetcher freezes.
> 
> Stefan
> 
> Am 03.11.2006 um 15:53 schrieb Aisha:
> 
>>
>> Hi,
>>
>> I don't know why but I have no answer on the 3 forums where I sent my
>> problem........
>> As the problem of Fetcher freezes occurs every time I try  to fetch  
>> my file
>> system I can't imagine that I am the only one who have this problem  
>> and as I
>> said in my last e-mail, I found many mails about this problem but no
>> solution seems have been done........
>> It is a big problem so I don't understand why nobody seems  
>> interested on
>> it........
>>
>> I try to crawl over my file system but the crawl never finished, it  
>> aborted
>> with the message "Aborting with 3 hung threads".
>>
>> The number of hung threads is not the same if I retry....
>>
>> I modify the configuration grawing the number of threads but it  
>> doesn't
>> solve the problem........
>>
>> Please could somebody help me,
>> I can't crawl my file system..........
>>
>> thanks in advance.
>> Aïcha
>>
>> -- 
>> View this message in context: http://www.nabble.com/Fetcher-freezes- 
>> tf2568287.html#a7158776
>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>>
>>
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 101tec Inc.
> search tech for web 2.1
> Menlo Park, California
> http://www.101tec.com
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Fetcher-freezes-tf2568287.html#a7199731
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Mime
View raw message