lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?
Date Mon, 01 Mar 2010 18:43:08 GMT

(Man, why is it you guys alwasy decide to start the monolithic 
"let's redesign the world" threads while i'm offline for a few days ... 
I figured at worst I'd 'svn up' and discover that McCandless had 
reimplemented all of the indexing code in Scala, but i certainly wasn't 
expecting all of this.)

As some one who has attempted to read it all at once, let me just say that 
this thread is way too big.  

I say this not as a facetious comment about the number of messages or the 
depth of replies but as a serious comment about the breadth and depth of 
the core issues that people seem to be trying to address in a monolithic 
fashion -- monolithic suggestions which are in many ways diametricly 
opposed to each other.

Without obvious concensious on where we want to go, or a clear sense of 
how well things will work when we there "there" it seems most productive 
to focus on what would be needed to achieve some incremental steps that 
could be productive for any/all goals.

At it's core: this thread started with McCandless'ss suggestion that 
refactoring some of text analysis code from Solr, Nutch and Lucene-Java 
out of all three projects and into a common code base would be beneficial 
to all three subprojects -- Not only do I see no flaw to that reasoning, 
but it also seems like it would (oddly enough) serve as a good first step 
towards *either* tighter development integration between Lucene-Java and 
Solr, *OR* towards looser development of the two code bases (via making 
Solr a seperate TLP).

Developing a new code module like this should help demonstrate / excercise 
some of the "process" issues that might come up in trying to integrate the 
development and release processes of the existing products.  If things 
work out "well" that may illustrate that tighter integration is better; if 
things work out "poor" that should also tells us something, and may give 
us guidance on how to move forward.  In the worst case scenerio that i can 
imagine: some code is refactored out of Solr and Nutch in a way that makes 
it more directly usable by other comsumers of Lucene-Java.  (Even if Solr 
and Nutch never use that code and become their own TLPs and succed from 
the ASF to become caribbean tax haven that seems like a Net win for 

To put the issue another way: Does anyone see how McCandless'ss suggestion 
would be counter-productive towards your vision of what Lucene/Solr/Nutch 
should be like in the future? (regardless of your particular vision is)


: I started here with analysis because I think that's the biggest pain
: point: it seemed like an obvious first step to fixing the code
: duplication and thus the most likely to reach some consensus.  And
: it's also very timely: Robert is right now making all kinds of great
: fixes to our collective analyzers (in between bouts of fuzzy DFA
: debugging).


View raw message