lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reitzel, Charles" <Charles.Reit...@tiaa-cref.org>
Subject RE: [DISCUSS] Change Query API to make queries immutable in 6.0
Date Thu, 02 Apr 2015 22:10:47 GMT
Unfortunately, since boost is used in hashCode() and equals() calculations, changing the boost
will still make the queries "trappy".   You will do all that work to make everything-but-boost
immutable and still not fix the problem.

You can prove it to yourself like so (this test fails!):

  public void testMapOrphan() {
    Map<Query, Integer> map = new HashMap<>();
    BooleanQuery booleanAB = new BooleanQuery();
    booleanAB.add(new TermQuery(new Term("contents", "a")), BooleanClause.Occur.SHOULD);
    booleanAB.add(new TermQuery(new Term("contents", "b")), BooleanClause.Occur.SHOULD);
    map.put( booleanAB, 1 );
    
    booleanAB.setBoost( 33.3f );	// Set boost after map.put()
    assertTrue( map.containsKey(booleanAB) );
  }

Seems like the quick way to the coast is to write a failing test - before making changes.

I realize this is easier said than done.  Based on your testing that led you to start this
discussion, can you narrow it down to a single Query class and/or IndexSearcher use case?
  Not there will be only one case.  But, at least, it will be a starting point.   Once the
first failing test has been written, it should be relatively easy to write test variations
to cover the remaining "mutuable" Query classes.

With the scale of the changes you are proposing, "test first" seems like a reasonable approach.

Another compromise approach might be to sub-class the mutable Query classes like so:

class ImmutableBooleanQuery extends BooleanQuery {
   public void add(BooleanClause clause) {
      throw new UnsupportedOperationException( "ImmutableBooleanQuery.add(BooleanClause)"
);
   }
   public void setBoost( int boost ) {
      throw new UnsupportedOperationException( "ImmutableBooleanQuery.add(BooleanClause)"
);
   }
   // etc.

  public static ImmutableBooleanQuery cloneFrom(BooleanQuery original) {
       // Use field level access to by-pass mutator methods.
  }

   // Do NOT override rewrite(IndexReader)!
}

In theory, such a proxy class could be generated at runtime to force immutability:
https://github.com/verhas/immutator

Which could make a lot of sense in JUnit tests, if not production runtime.

An immutable Query would be cloned from the original and place on the cache instead.  Any
attempt to modify the cache entry should fail quickly.   

To me, a less invasive approach seems like a faster and easier way to actually find and fix
this bug.
Once that is done, then it might make sense to perform the exhaustive updates to prevent a
"relapse" in the future.

-----Original Message-----
From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Thursday, April 02, 2015 9:46 AM
To: dev@lucene.apache.org
Subject: Re: [DISCUSS] Change Query API to make queries immutable in 6.0

Boosts might not make sense to become immutable, it might make the code too complex. Who is
to say until the other stuff is fixed first.
The downsides might outweight the upsides.

So yeah, if you want to say "if anyone disagrees with what the future might look like i'm
gonna -1 your progress", then i will bite right now.

Fixing the rest of Query to be immutable, so filter caching isn't trappy, we should really
do that. And we have been doing it already. I remember Uwe suggested this approach when adding
automaton and related queries a long time ago. It made things simpler and avoided bugs, we
ultimately made as much of it immutable as we could.

Queries have to be well-behaved, they need a good hashcode/equals, thread safety, good error
checking, etc. It is easier to do this when things are immutable. Someone today can make a
patch for FooQuery that nukes setBar and moves it to a ctor parameter named 'bar' and chances
are a lot of the time, it probably fixes bugs in FooQuery somehow.
Thats just what it is.

Boosts are the 'long tail'. they are simple primitive floating point values, so susceptible
to less problems. The base class incorporates boosts into equals/hashcode already, which prevents
the most common bugs with them. They are trickier because internal things like
rewrite() might "shuffle them around" in conjunction with clone(), to do optimizations. They
are also only relevant when scores are needed:
so we can prevent nasty filter caching bugs as a step, by making everything else immutable.


On Thu, Apr 2, 2015 at 9:27 AM, david.w.smiley@gmail.com <david.w.smiley@gmail.com>
wrote:
> On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand <jpountz@gmail.com> wrote:
>>
>> first make queries immutable up to the boost and then discuss 
>> if/how/when we should go fully immutable with a new API to change 
>> boosts?
>
>
> The “if” part concerns me; I don’t mind it being a separate issue to 
> make the changes more manageable (progress not perfection, and all 
> that).  I’m all for the whole shebang.  But if others think “no” 
> then…. will it have been worthwhile to do this big change and not go all the way? 
I think not.
> Does anyone feel the answer is “no” to make boosts immutable? And if so why?
>
> If nobody comes up with a dissenting opinion to make boosts immutable 
> within a couple days then count me as “+1” to your plans, else “-1” 
> pending that discussion.
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer 
> http://www.linkedin.com/in/davidwsmiley

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail:
dev-help@lucene.apache.org


*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and then delete
it.

TIAA-CREF
*************************************************************************
Mime
View raw message