lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene's default settings & back compatibility
Date Tue, 19 May 2009 17:51:51 GMT
On Tue, May 19, 2009 at 8:56 AM, Grant Ingersoll <gsingers@apache.org> wrote:

>> Why not?  The settings object could have say a property
>> "analysis.standard.enableStopFilter"?
>
> And what if it is something that has to be called in the next() chain and
> not during construction?  Are you going to want to call that every single
> time over millions upon millions of tokens in a large collection?   Even if
> it is during construction, you still might end up calling it a lot of times.

In fact, we already do that today (look at StandardTokenizer.java).

This doesn't differentiate in the current discussion ("using Settings
class to hold defaults").  Ie, regardless of whether we use Settings
(what's being proposed), or we make awkward set/getters all over our
classes (what's done today), doing so inside inner loops is still no
good.

I think you've moved onto discussing something different: should we
relax our back compat policy.  I'm all for that discussion, but it's
different from "given our back compat policy, how can we implement it
w/o harming new users of Lucene".

> There's a difference between std. coding practices and purposefully putting
> in lots of if checks to solve back compatibility issues that are created in
> order to satisfy some naming convention.  Given the length of time between
> releases, we could easily call every new release a major version and we
> wouldn't be all that different from most commercial projects.  I'd bet if we
> switched from calling things major.minor and just called them Lucene '09 and
> Lucene '10 people would be just fine with the changes.
>
> I've said it before and I'll say it again.  Given the time between Lucene
> releases (at least 6 mos. for minor releases and 1+ year for majors) we have
> _PLENTY_ of time to let users know what is coming and plan accordingly.   By
> being so dogmatic about back compatibility, I believe we are making it
> harder to innovate and harder for new people to contribute and we keep cruft
> around for way too long.  (How the heck is a new contributor supposed to
> keep track of all the things that went into Lucene for the past 1.5 years?)
>  I'm not saying we should throw back compat. out the window, I'm just saying
> we should take it more on a case by case basis, with the default, obviously,
> being to favor back compatibility.  The large majority of users  (I'd
> venture to say well north of 95% of them) will be able to deal with minor
> API changes every 6 to 8 months, especially if we are more proactive about
> communicating them to java-user@ and in CHANGES.  In fact, if we announced
> changes that are going to break for not the next version, but the one after,
> it would give people lots of time to adapt.

You've moved onto "should we relax our back-compat policy".  Yes, we
can consider doing so... but I'd like to stay focused here on "should
we switch to the Settings* approach to implement our back compat
policy".

By using Settings that explicitly capture the defaults for each
version, we can have our cake and eat it too: we are no longer forced
to stunt Lucene's growth for the minority that need strong
back-compat.  It also makes us freer to select our back-compat policy
since it's no longer a tradeoff of hurting new users.

> I think you missed the point.  The problem lies in releasing 2.4's settings
> and those settings are wrong.  Using your example, say Settings24 was messed
> up and set trackMaxScore to true when it should have been false (mistakes
> happen).  It gets released in 2.9 as the settings for 2.4 back
> compatibility.  We then realize our mistake.  How do you fix it?  You can't
> just set it to false, b/c now you have users who are depending, potentially,
> on the _wrong_ version.  So, now you have to deprecate it and come out with
> a "new" Settings2.4 called something else.

Well... that's a rather major mistake: if you add new feature X
("scoring is optional when sorting by field") and then in the
back-compat settings you get it backwards ("turn off scoring by
default"), that's quite an error.

I would hope/expect it's quite rare.  If such a bigtime mistake
happens I think that warrents a fast point-release turnaround fixing
it.

Also, this isn't differentiating, ie we could make such a mistake
today by incorectly defaulting one of our back-compat setters (and I
think in that case we also would turnaround a fast point release to
fix .

>>> I still think we would benefit from just communicating upcoming changes
>>> better even in minor releases, thereby allowing for a bit more variance
>>> in
>>> back compat.  It should be the exception, not the rule.
>>
>> I like DM's point, that this Settings class would be a great vehicle
>> for exactly that communication.  Rather than pouring over a
>> CHANGES.txt, you can see setting-by-setting what changed, and why.
>
> Sorry, I'd rather read CHANGES.  It is the one place we all make sure to
> enter our changes.  People aren't as good about javadocs, especially
> accessors where the name is "self explanatory".  Plus it has a link to a
> JIRA issue.

Let me restate: I think we'd do both -- CHANGES is still the
definitive place to see the exhaustive list of all changes, but
Settings* is the place to see changes where maintaining strict
back-compat costs you an important new feature.  EG because you are
using Settings24 you'd see that you're not taking advantage of the
performance gain of not computing scores when sorting by field.

> Also, how useful is it going to be to have 30 or 40 (hundreds?) accessors on
> a single Settings object?

I think the Settings24 would have far fewer?  Ie it'd have only the
settings forced to deviate from the preferred default.

> So, then, the logical thing to do is to split it up and have some
> nested way of doing things.  And then people will be tired of having
> to programmatically set all the values, so they will create a
> config/properties file that does it.  But, because we don't like
> dependencies, we will re-invent how that works.  After it's all said
> and done, you end up having re-invented IOC.

I agree there is a real risk of over-designing this.

Maybe... we only migrate things into the Settings* when they diverge
across versions?  That should keep the settings quite minimal.  Such
settings are typically deprecated anyway.  And rename it
"BackCompatSettings", or something, to make it clear.

> Another interesting thing to think about is how do we sunset old settings
> objects.  When we are on 4.X, should we still keep around 2.4 settings?  Not
> really something we necessarily need to solve right now.

That's also a policy (not implementation) question; our current policy
is to remove 2.* on releasing 3.0.  I think we'd want to stick with
that policy, ie many of these "back compat only settings" are
deprecated (eg autoCommit) so come 3.0 we can remove them.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message