lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Lucene's default settings & back compatibility
Date Mon, 18 May 2009 21:06:39 GMT
As we all know, Lucene's back-compat policy necessarily hurts the
out-of-the-box experience for new users: because we are only allowed
make substantial improvements to Lucene's default settings at a major
release, new users won't see the improvements to our settings until a
major release (typically years apart).

Lucene has a number of default settings, eg some recent examples:

  * Read-only IndexReader gives better much performance with threads,
    yet we must now default IndexReader.open to return a non-readOnly
    reader

  * We can now optionally turn off scoring when sorting by field
    (sizable speed gain), but we had to leave it on by default until
    3.0

  * Letting IndexReader.norms return null

  * LogMergePolicy now takes deletions into account, but we had to
    disable it by default, since it could conceivably break back
    compat.

  * Bug fixes in StandardAnalyzer must be delayed until 3.0 since
    there's a remote chance they'd break back compat in an app, or we
    end up adding confusing methods like "public static void
    setDefaultReplaceInvalidAcronym".

  * NIOFSDirectory ought to be "the default" on UNIX, but it's not

  * Constant score rewrite ought to be the default for most multi-term
    queries

  * StopFilter should enable position increments by default

The fact that we are "forced" delay such "out of the box" improvements
to Lucene for so long is a frustrating cost, since it can only stunt
Lucene's adoption and growth and my sense is that it's a minority of
Lucene's users that need such strict back-compat (this has been
discussed before).  It also clutters our APIs because we end up
creating setter/getters that often only exist for the sake of a back
compat preservation of a bug.

I think we can fix this.  Ie, maintain our strong back-compat policy,
yet still allow new users to experience the best of Lucene on every
release (not just on major releases), by creating an explicit class
that holds settings/defaults used by Lucene.

For example, say we create a base class named Settings.  It holds the
defaults for settings across all of Lucene's classes. When you create
IndexReader, IndexWriter and others, you must pass in a Settings
instance.

A subclass, SettingsMatching24, binds all settings to "match" 2.4's
behavior.  When we make improvements in 2.9, we'd add the back-compat
settings to SettingsMatching24.  So if your app wants to keep exactly
2.4's behavior, you'd pass in SettingsMatching24().  On upgrading to
2.9 you'd still see 2.4's behavior.

Users who'd like to see Lucene's improvements on each minor release
would instead instantiate LatestAndGreatestSettings() (or
CurrentVersionSettings(), or something), understanding that when they
upgrade there might be biggish changes to Lucene's defaults.  My guess
is most users would use this settings class.

Doug actually suggested this exact idea a while back:

  http://www.gossamer-threads.com/lists/lucene/java-dev/54421#54421.

Now that I realize we could use this to strongly decouple "users
wanting precise back-compat" from "users wanting the latest &
greatest", I think it's a very compelling solution.

If we do this I'd like to do it in 2.9, so that starting with 3.x we
are free to change default settings w/o breaking back compat.

Thoughts?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message