lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <>
Subject Re: [ANNOUNCE] Apache Lucene 4.0 released.
Date Fri, 12 Oct 2012 08:34:03 GMT

On Fri, Oct 12, 2012 at 10:34 AM, Uwe Schindler <> wrote:
> Thanks Robert for doing the hard work of managing this release!
> I am happy that the release finally came out, after a long time of development, code
refactoring, and lots of non-finite beer-automatons!
> Uwe
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> eMail:
>> -----Original Message-----
>> From: Robert Muir []
>> Sent: Friday, October 12, 2012 10:10 AM
>> To:; Lucene mailing list; java-user; announce
>> Subject: [ANNOUNCE] Apache Lucene 4.0 released.
>> October 12 2012, Apache Luceneā€š 4.0 available.
>> The Lucene PMC is pleased to announce the release of Apache Lucene 4.0
>> Apache Lucene is a high-performance, full-featured text search engine library
>> written entirely in Java. It is a technology suitable for nearly any application
>> that requires full-text search, especially cross-platform.
>> This release contains numerous bug fixes, optimizations, and improvements,
>> some of which are highlighted below.  The release is available for immediate
>> download at:
>> See the CHANGES.txt file included with the release for a full list of details.
>> Lucene 4.0 Release Highlights:
>>  * The index formats for terms, postings lists, stored fields, term vectors, etc
>> are pluggable via the Codec api. You can select from the provided
>> implementations or customize the index format with your own Codec to meet
>> your needs.
>>  * Similarity has been decoupled from the vector space model (TF/IDF).
>> Additional models such as BM25, Divergence from Randomness, Language
>> Models, and Information-based models are provided (see
>> 4).
>>  * The new doc values feature stores typed values per-document.  It can be
>> used for custom scoring factors (accessible via Similarity), for pre-sorted Sort
>> values, and more.
>>  * IndexWriter now flushes segments to disk concurrently, when the application
>> uses multiple threads for indexing, resulting in substantial performance
>> improvements (see
>> speedup-with-lucenes.html).
>>  * Per-document normalization factors ("norms") are no longer limited to a
>> single byte. Similarity implementations can use any DocValues type to store
>> norms.
>>  * New index statistics have been added, including the number of tokens for a
>> term or field, number of postings for a field, and number of documents with a
>> posting for a field.  These support additional scoring models (see
>> 40.html).
>>  * A new default term dictionary/index (BlockTree) indexes shared prefixes
>> instead of every n'th term. This is not only more time- and
>> space- efficient, but can avoid going to disk at all for terms that do not exist
>> certain cases. Alternative term dictionary implementions are provided and
>> pluggable via the Codec api.
>>  * Indexed terms are no longer limited to UTF-16 char sequences; they can now
>> be any binary value encoded as byte arrays. By default, text terms are encoded
>> as UTF-8 bytes. Sort order of terms is defined by their binary value, which is
>> identical to UTF-8 (Unicode code point) sort order.
>>  * Substantially faster performance when using a Filter during searching.
>>  * File-system based directories can rate-limit the IO (MB/sec) of merge
>> threads, to reduce IO contention between merging and searching threads.
>>  * A number of alternative Codecs and components have been added:
>> "Appending" works with append-only filesystems (such as Hadoop DFS),
>> "Memory" writes the entire terms+postings as an FST read into RAM (see
>> with.html),
>> "Pulsing" inlines the postings for low-frequency terms into the term dictionary
>> (see
>> primary-key.html),
>> "SimpleText" writes all files in plain-text for easy debugging/transparency (see
>> "Bloom" uses a bloom filter to sometimes avoid disk seeks when looking up
>> terms, "Direct" holds all postings as simple byte[] and int[] for very fast
>> performance at the cost of very high RAM consumption, "Block" use a new
>> index layout and compression scheme for improved performance, among
>> others.
>>  * Term offsets can be optionally encoded into the postings lists and retrieved
>> per-position.
>>  * A new AutomatonQuery returns all documents containing any term matching
>> a provided finite-state automaton (see
>> state-queries-in-lucene).
>>  * FuzzyQuery is 100-200 times faster than in past releases (see
>> faster.html).
>>  * A new spell checker, DirectSpellChecker, finds possible corrections directly
>> against the main search index without requiring a separate index.
>>  * Various in-memory data structures such as the term dictionary and
>> FieldCache are represented more efficiently with less object overhead (see
>> searching.html).
>>  * All search logic is now required to work per segment, IndexReader was
>> therefore refactored to differentiate between atomic and composite readers
>> (see
>>  * Lucene 4.0 provides a modular API, consolidating components such as
>> Analyzers and Queries that were previously scattered across Lucene core,
>> contrib, and Solr. These modules also include additional functionality such as
>> UIMA analyzer integration and a completely reworked spatial search
>> implementation.
>> Noteworthy changes since 4.0-BETA:
>>  * A new "Block" PostingsFormat offering improved search performance and
>> index compression. This will likely become the default format in a future
>> release. (see
>> blockpostingsformat-thanks.html).
>>  * All non-default codec implementations were moved to a separated codecs
>> module. Just add lucene-codecs-4.0.0.jar to your classpath to test these out.
>>  * Payloads can be optionally stored on the term vectors.
>>  * Many bugfixes and optimizations.
>> Please read CHANGES.txt and MIGRATE.txt for a full list of new features and
>> notes on upgrading. Particularly, the new apis are not compatible with previous
>> versions of Lucene, however, file format backwards compatibility is provided
>> for indexes from the 3.0 series and the 4.0-alpha and -beta releases.
>> Please report any feedback to the mailing lists
>> (
>> Note: The Apache Software Foundation uses an extensive mirroring network for
>> distributing releases.  It is possible that the mirror you are using may not have
>> replicated the release yet.  If that is the case, please try another mirror.  This
>> also goes for Maven access.
>> Happy searching,
>> Apache Lucene/Solr Developers
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message