lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: bytecount as String and prefix length
Date Mon, 31 Oct 2005 21:28:14 GMT
I wrote...

> Unfortunately, once the changes to TermBuffer, TermInfosWriter, and  
> StringHelper are applied, execution speed at index-time suffers a  
> slowdown of about 20%.  Perhaps this can be blamed on all the calls  
> to getBytes("UTF-8") in TermInfosWriter?  Maybe alternative  
> implementations using ByteBuffer, CharsetDecoder, and  
> CharsetEncoder are possible that can mitigate the problem?

Nope.

The version of writeTerm below is about the same speed as the one  
with the calls to getBytes("UTF-8").

I think I'll take a crack at a custom charsToUTF8 converter algo.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

//---------------------------------------------------------------------- 
---

   private final void writeTerm(Term term)
        throws IOException {
     byteBuf.clear();
     while (true) {
       CoderResult status = utf8Encoder.encode(CharBuffer.wrap 
(term.text()),
         byteBuf, false);
       if (status.isOverflow()) {
         bufSize += 32;
         byteBuf = ByteBuffer.allocate(bufSize);
         utf8Encoder.reset();
       }
       else {
         break;
       }
     }
     int totalLength = byteBuf.position();
     int start = StringHelper.bytesDifference(lastByteBuf, byteBuf);
     int length = totalLength - start;

     output.writeVInt(start);                   // write shared  
prefix length
     output.writeVInt(length);                  // write delta length

     byte[] bytes = byteBuf.array();
     for (int i = start ; i < totalLength; i++) {
       output.writeByte(bytes[i]);              // write delta UTF-8  
bytes
     }
     output.writeVInt(fieldInfos.fieldNumber(term.field)); // write  
field num

     lastTerm = term;
     // swap byteBuf and lastByteBuf
     scratchByteBuf = lastByteBuf;
     lastByteBuf = byteBuf;
     byteBuf = scratchByteBuf;
   }



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message