lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: UInt32 or Int32
Date Thu, 22 Sep 2005 21:16:27 GMT

On Sep 22, 2005, at 1:16 PM, Yonik Seeley wrote:

> I'd lean toward keeping UInt32 in general, so at least that will  
> scale to 4B
> documents. SegSize is the only place where UInt32 is used that it will
> matter (all of the other uses will never approach that size).

OK, sounds good.

> writeInt() writes both signed and unsigned integers (or rather the bit
> pattern could be interpreted as either, and it's up to the  
> definition to
> decide which it is).

Good point.  On the Perl side, I'm specifying how they are  
interpreted within the IO method, rather than by a cast outside the  
method.  Effectively I have writeSignedInt and writeUnsignedInt.

> You're right about FORMAT... something should be changed to make it
> consistent.
> It could be defined as 0xffffffff instead of -1.

That would be a little strange because the test to determine whether  
an index in the new format is whether or not FORMAT is less than 0.   
The present implementation isn't buggy or problematic, it's just that  
the logical inconsistency in the specs doc is confusing for people  
like me who are trying to write compliant code.  If FORMAT gets  
redefined as 0xffffffff, that suggests to me that the algorithm for  
identifying the new format should change, to something like if  
(FORMAT > 0x7FFFFFF).  I don't think either of us wants to change any  
code outside the specs doc.

I believe that FORMAT in segments and FORMAT in .tis/.tii are the  
only places in the Lucene file format where negative numbers are  
required.  Would it be overkill to define an Int32 primitive datatype  
just for those?


     32-bit signed integers are written as four bytes,
     high-order bytes first, in twos-complement encoding.

     Int32 --> <Byte>4

Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message