lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Binda <>
Subject Re: BytesRef violates the principle of least astonishment
Date Wed, 20 May 2015 10:34:09 GMT
On 05/20/2015 08:53 AM, Trejkaz wrote:
> On Wed, May 20, 2015 at 3:21 PM, Olivier Binda <> wrote:
>> My take :
>> Indeed BytesRef is mutable
>> This happens for performance reasons, to avoid unnecessary object
>> creations and unecessary copying and Also to workaround
>> the java "issue" that most of the time  you need to pass an array with an
>> offset and length in methods for performance but you don't want to create
>> an array every time you have to do that
>> In your case, you are supposed to copy your bytes because, indeed, the
>> bytesRef will change everytime you call a lucene method on it
>> (it is mutable) and the array it points to will change too because these
>> might be internal arrays of readers/buffers/codecs
>> (and you don't know the internal working of those)...
> That's fair enough, most of this is a philosophical issue anyway. Some
> people prefer reusing objects and overwriting data because they don't
> trust GC or whatever. I prefer immutable objects because at least then
> when you have an object you can guarantee nobody else can mess with
> it.
> But that aside, it's still astonishing when the method to clone an
> object doesn't actually clone it. There isn't any other obvious method
> on BytesRef to perform a copy, either. What are we supposed to do,
> pull out the byte array, offset and length manually and then jam it
> into another BytesRef? Ew.
If you want immutable data, you have to create a new byte array and copy 
the bytes in there

Lucene gives the user of the library the choice of how to use the data 
(which is good)
  instead of creating immutable data for everybody and to make people 
who don't need it suffer the penalty

There are other places in Lucene that are designed with performance as a 
and that may not behave as one would superficially think :

For example, I realized recently that the binaryDocValues that I was 
sharing in a hashmap were not thread safe (it's written in the doc 
but...sometimes...after a while you use something whose details you have 
forgotten) and I had to take mesure to use the clone method they have to 
use them as they were designed to be

Lucene is a library that you have to understand, in order not to shoot 
yourself in the foot with...
like persisting docIds is 99.99% of the time a very bad idea (there are 

Paying attenton to the docs helps a lot

> TX
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message