nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yoursoft@freemail.hu" <yours...@freemail.hu>
Subject Re: IndexOptimizer bug?
Date Thu, 04 Aug 2005 14:06:15 GMT
Dear Michael,

I writed a tool OptimizeIndex.java, this is faster and there aren't 
questions: what it is do?
After you optimize index with IndexOptimizer, the number of searching 
for 'http' is the same?

Regards,
    Ferenc

Michael Nebel wrotte:

> Hi,
>
> I fixed the problem with the following patch:
>
> --- IndexOptimizer.java 2005-08-04 12:55:54.000000000 +0200
> +++ IndexOptimizer.java.~1.6.~  2005-01-21 00:48:50.000000000 +0100
> @@ -138,7 +138,7 @@
>
>          if (score > minScore) {
>            sdq.put(new ScoreDoc(doc, score));
> -          if (sdq.size() >= count) {               // if sdq overfull
> +          if (sdq.size() > count) {               // if sdq overfull
>              sdq.pop();                            // remove lowest in 
> sdq
>              minScore = ((ScoreDoc)sdq.top()).score; // reset minScore
>            }
>
> My index shrinked from 8.5 GB to 0.5 GB. I found no documentation 
> about the background of this tool. Can anyone tell me, what's the idea 
> behind?
>
> Regards
>
>     Michael
>
>
>
> Andy Liu wrote:
>
>> I believe this tool is unfinished and unsupported.
>>
>> On 7/22/05, yoursoft@freemail.hu <yoursoft@freemail.hu> wrote:
>>
>>> I found an IndexOptimzer in nutch.
>>> When I run it, it dorps an exception:
>>> ....
>>> Optimizing url:http from 226957 to 22696
>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 
>>> 22697
>>>        at 
>>> org.apache.lucene.util.PriorityQueue.put(PriorityQueue.java:46)
>>>        at
>>> org.apache.nutch.indexer.IndexOptimizer$OptimizingTermPositions.seek(IndexOptimizer.java:153)

>>>
>>>        at
>>> org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:325)

>>>
>>>        at
>>> org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:296) 
>>>
>>>        at
>>> org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:270)

>>>
>>>        at
>>> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:234) 
>>>
>>>        at
>>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:96)
>>>        at
>>> org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:578)
>>>        at
>>> org.apache.nutch.indexer.IndexOptimizer.optimize(IndexOptimizer.java:215) 
>>>
>>>        at
>>> org.apache.nutch.indexer.IndexOptimizer.main(IndexOptimizer.java:235)
>>>
>
>


Mime
View raw message