lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stanislaw Osinski (JIRA)" <>
Subject [jira] Updated: (SOLR-1804) Upgrade Carrot2 to 3.2.0
Date Wed, 21 Jul 2010 13:41:49 GMT


Stanislaw Osinski updated SOLR-1804:

    Attachment: SOLR-1804-carrot2-3.4.0-dev.patch


As we're near the 3.4.0 release of Carrot2, I'm including a patch that upgrades the clustering
plugin. The most notable changes are:

* [3.4.0] Carrot2 core no longer depends on Lucene APIs, so the {{build.xml}} can be enabled
again. The only class that makes use of Lucene API, {{LuceneLanguageModelFactory}}, is now
included in the plugin's code, so there shouldn't be any problems with refactoring. In fact,
I've already updated {{LuceneLanguageModelFactory}} to remove the use of deprecated APIs.
* [3.3.0] The STC algorithm has seen some [significant scalability improvements|]
* [3.2.0] Carrot2 core no longer depends on LGPL libraries, so all the JARs can now be included
in Solr SVN and SOLR-2007 won't need fixing.

Included is a patch against r966211. A ZIP with JARs will follow in a sec.

A couple of notes:

* The upgrade requires upgrading Google collections to Guava. This is a drop-in replacement,
all tests pass for me after the upgrade, plus the upgrade is [recommended|]
on the original Google Collections site.
* The patch includes Carrot2 3.4.0-dev JAR, but I guess it's worth committing already to avoid
the library downloads hassle (SOLR-2007).
* Originally, Carrot2 supports clustering of Chinese content based on the Smart Chinese Tokenizer.
This tokenizer would have to be referenced from the {{LuceneLanguageModelFactory}} class in
Solr. However, when compiling the code in Ant, this smartcn doesn't seem available in the
classpath. Is it a matter of modifying the build files, or it's a policy on dependencies between

Let me know if you have any problems applying the patch.



> Upgrade Carrot2 to 3.2.0
> ------------------------
>                 Key: SOLR-1804
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Clustering
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>         Attachments: SOLR-1804-carrot2-3.4.0-dev.patch
> Carrot2 is now LGPL free, which means we should be able to bundle the binary!

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message