lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Ferret's changes
Date Tue, 10 Oct 2006 12:34:24 GMT
I would be interested in another survey, this time about how many  
people use a fixed set of Fields in their applications.  The large  
majority of mine do.  I know SOLR supports dynamic fields, but I  
wonder how much they are used.  If there truly is a benefit to it,  
then perhaps we can have an implementation that can utilize them.

I would like to hear more about your merge strategy and how you do  
the hashing.  Perhaps if we all work through it then can figure out  
some ways to incorporate it.  As for backwards compatibility, we have  
a strategy for dealing with it that I think works (deprecation).   
Furthermore, there is no reason we can't start working towards a new  
framework for indexing/searching that is interface based and allows  
for using the existing format or a newer format as Marvin, Doug and  
others have suggested (in fact we have a first attempt at it as a  

As for benchmarks, in my experience, the people who get all touchy  
are those who are so married to one way of doing things that they  
can't think of any other way to solve a problem.  I think reasonable  
people who want Lucene to be better will take the benchmarks as  
lessons in how to improve Lucene, not as some personal attack on  
them.  Once I get the basics of our benchmark stuff in place, it  
would be interesting to implement the Ferret version and see how it  
stacks up.  So far, we have been using but I can see about  
incorporating the Reuters collection in, as this is much more the  
standard when it comes to these things


On Oct 10, 2006, at 5:02 AM, David Balmain wrote:

> On 10/10/06, Otis Gospodnetic <> wrote:
>> Hi,
>> Maybe I missed it, but I was surprised that nobody here wondered  
>> about the algorithm and data structure changes that Dave Balmain  
>> made in Ferret, to make it go faster (than Java Lucene).  I know  
>> I've been wondering whether/when Dave will bring those up, and  
>> what the chances of those changes being applied to Java Lucene are.
>> Here is an interesting and recent interview with Dave that  
>> mentions some of this stuff.
>> balmain.html
>> Otis
> Hi Otis,
> I did bring this up here:
> 200607.mbox/% 
> The reason I didn't press the issue was that the changes are pretty
> substantial and would break backwards compatibility in Lucene. Also, I
> didn't think the major performance benifits would map back to Java
> since I'm taking advantage of the fact that I have so much control
> over memory allocation in C.
> Given these factors and the fact that benchmarks can be a very touchy
> subject, particularly in the Java community, I thought it better to
> leave any performance comparison off this list. It looks like the cat
> is out of the bag now so I'll put some benchmarks up on my Wiki and
> everyone can check that I haven't cheated or made any mistakes. I'll
> use the Reuters collection:
> If anyone thinks I should use a different corpus, please let me know.
> I also have the entire Gutenburg collection here. I'll post a link
> when I'm done.
> Cheers,
> Dave
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message