lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject [REPORT] Lucene December 2009 Board Report
Date Wed, 09 Dec 2009 13:49:03 GMT
=== Lucene Status Report: December, 2009 ===


-The PMC added George Aroush and Chris Mattmann to the PMC
-The PMC added Open Relevance committer Robert Muir
-The PMC added Mahout committer Jake Mannix
-The PMC added Tika committer Ken Krugler


Lucene Java is a search-engine toolkit.  Development has been
active and we released both 2.9 and 3.0 this quarter


Solr is a full text search server using Lucene Java.  
Development and the community is active.  Solr released
version 1.4 this quarter.


Nutch is a web-search engine: crawler, indexer and search runtime. There has
been a recent flurry of work on discussing Nutch's future post ApacheCon, 
spearheaded by Andrzej Bialecki and others. In addition, there is ongoing
work on reducing code duplication (tighter integration of the Tika parsing
framework and mime type detection, better Solr integration) and using a
more flexible storage system (e.g. HBase). Many issues are being fixed in
preparation for a 1.1 release early next quarter.


Lucy is a loose C port of Lucene targeted at dynamic language bindings.
Development this quarter has focused on abstraction of the IO subsystem and
portability to various compiler platforms.


Lucene.NET is a .NET based port of Lucene Java.  Development and the
community are active.  Lucene.NET graduated from the incubator and is 
now a full-fledged Lucene sub-project.


Apache Mahout is working towards
building a suite of scalable machine learning libraries for text and
data mining.  Development is active and version 0.2 was released this

Open Relevance Project

The Open Relevance Project is a new project aimed at providing Lucene
and others tools for judging the quality of search and machine
learning approaches.  The project added Robert Muir as a committer
this quarter and development is getting under way. Recent work 
has added support for Indonesian "Tempo" and Persian
"Hamshahri" collection to execute relevance judgements with


PyLucene is a Python integration of Lucene Java. Development is
active. Closely tracking the Lucene Java releases, we released PyLucene
2.9.0, PyLucene 2.9.1 and PyLucene 3.0.0 this quarter. A major addition was
made to JCC, the code generator making PyLucene possible: the support
for Java generics now in use by Lucene Java 3.0.


Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.  Tika released version 0.5 this quarter. There have been
recent development efforts to speed up Tika's mime detector, as well as
efforts to provide a self-contained OGSI-based Tika bundle. There is a 
strong desire to release these post 0.5 improvements, so we are planning
to release Tika 0.6 in the next few weeks.

View raw message