lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject SortingAtomicReader alternate to Tim-Sort...
Date Tue, 14 Apr 2015 12:07:34 GMT
We were experimenting with SortingMergePolicy and came across an alternate
solution to TimSort of postings-list using FBS & GrowableWriter.

I have attached relevant code-snippet. It would be nice if someone can
clarify whether it is a good idea to implement...

public class SortingAtomicReader {
…
…
class SortingDocsEnum {

//Last 2 variables namely *newdoclist* & *olddocToFreq* are added in
//constructor. It is assumed that these 2 variables are init during
//merge start & they are then re-used till merge completes...


public SortingDocsEnum(int maxDoc, final DocsEnum in, boolean withFreqs,
final Sorter.DocMap docMap, FixedBitSet newdoclist, GrowableWriter
olddocToFreq) throws IOException {

….

…

while (true) {

  //Instead of Tim-Sorting as in existing code

  doc = in.nextDoc();

  int newdoc = docMap.oldToNew(doc);

  newdoclist.set(newdoc);

  if(withFreqs) {

    olddocToFreq.set(doc, in.freq());

  }

}


@Override

public int nextDoc() throws IOException {

  if (++docIt >= upto) {

  return NO_MORE_DOCS;

  }

  currDoc = newdoclist.nextSetBit(++currDoc);

  if(currDoc == -1) {

    return NO_MORE_DOCS;

  }

  //clear the set-bit here before returning...

  newdoclist.clear(currDoc);

  return currDoc;

}


@Override

public int freq() throws IOException {

  if(withFreqs && docIt < upto) {

  return (int)olddocToFreq.getMutable()

                 .get(docMap.newToOld(currDoc));

  }

  return 1;

}

}

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message