lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3151) Make all of Analysis completely independent from Lucene Core
Date Sun, 29 May 2011 01:35:47 GMT


Robert Muir commented on LUCENE-3151:

Looks like it makes sense that we would have to pull out these classes to do it now... but
here are a few thoughts maybe for discussion... this stuff certainly should not block this
issue, its hard refactorings and a lot of work, but just ideas for the future.

As far as analyzers:
* does the lucene-core/common jar need to have all the tokenAttributes? Maybe it should only
have the ones that the indexer etc actually consume, and things like TypeAttribute, FlagsAttribute,
KeywordAttribute, Token, etc should simply be moved to the analysis module?
* does the lucene-core/common jar need to have Tokenizer/TokenFilter/CharFilter/CharReader/etc.
Seems like it really only needs TokenStream and those could also be moved to the analysis
* currently I think its bad that the analyzers depend upon so many of lucene's util package
(some internal)... long term we want to get rid of the cumbersome backwards compatibility
methods like Version and ideally have a very minimal interface between core and analysis so
that you could safely just use your old analyzers jar file, etc... maybe we should see how
hard it is to remove some of these util dependencies?

So in a way, this issue is related to LUCENE-2309...

> Make all of Analysis completely independent from Lucene Core
> ------------------------------------------------------------
>                 Key: LUCENE-3151
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Grant Ingersoll
>             Fix For: 4.0
>         Attachments: LUCENE-3151.patch
> Lucene's analysis package, including the definitions of Attribute, TokenStream, etc.
are quite useful outside of Lucene (for instance, Mahout uses them) for text processing. 
I'd like to move the definitions, or at least their packaging, to a separate JAR file so that
one can consume them w/o needing Lucene core.  My draft idea is to have a definition area
that Lucene core is dependent on and the rest of the analysis package can then be dependent
on the definition area.  (I'm open to other ideas as well)

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message