lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Potential Lucene drawbacks
Date Thu, 06 Mar 2003 20:42:58 GMT

I changed the Subject line for obvious reasons.

I'm in the same boat as Tatu, as I didn't understand all your
points...and I still don't :(  But some other things became more clear
from this email, so I'll comment on those.

--- Leo Galambos <> wrote:
> > > 1. 2 threads per request may improve speed up to 50%
> > Hmm? Could you clarify? During indexing, multithreading may speed
> things
> > up (splitting docs to index in 2 or more sets, indexing separately,
> combining
> > indexing). But... isn't that a good thing? Or are you saying that
> it'd be good 
> > to have multi-threaded search functionality for single search? (in
> my 
> > experience searching is seldom the slow part)
> you may improve indexing and searching. Indexing, because the merge
> operation will lock just one thread and smaller part of an index
> while
> other threads are still working;  searching, because you can
> distribute
> the query to more barrels. In both cases you save up to 50% of time
> (I
> assume mergefactor=2).

I don't follow the indexing part, but you can certainly perform
distributed searches.  They are not parallelized currently, so searches
will run one after the other, but...

> > > 2. Merger is hard coded
> > 
> > In a way that is bad because... ?
> > (ie. what is the specific problem... I assume you mean index
> merging
> > functionality?)
> Because you cannot process local and/or remote barrels -- all must be
> local in Lucene object model. That is the serious bug IMHO.

If I understand you correctly, then maybe you are not aware of
RemoteSearchable in Lucene.
This is from CHANGES.txt:

   9. Added class RemoteSearchable, providing support for remote
      searching via RMI.  The test class
      provides an example of how this can be used.  (cutting)

> > > 4. you cannot implement dissemination + wrappers for internet
> servers
> > > which would serve as static barrels.
> > Could you explain this bit more thoroughly (or pointers on longer 
> > explanation)?
> Read more about dissemination, metasearch engines (i.e. Savvysearch),
> dDIRs (i.e. Harvest). BTW, let's go to a pub and we can talk til
> morning
> :) (it is a serious offer, because I would like to know more about
> IR).
> This example is about metasearch (the simplest case of dDIRs): Can
> Lucene
> allow that a barrel (index segment?) is static and a query is solved
> via
> wrapper, that sends the query ${QUERY} to
>${QUERY} and
> then
> reads the HTML output as a result?

This is the point that's more clear to me now.  There is confusion
about what Lucene is and what it is not.  Lucene does not even try to
be what those services you mentioned are.  Their goals are different,
they are a different set of tools.  Lucene's focus is on indexing text
and searching it.  It is not a tool to query other existing search
engines and parse returned HTML, etc.  It is also not a tool that wants
to have a built-in web crawler, and so on.  It's small and simple on
purpose, and comparing it with SavvySearch (still exists??), Harvest,
Dogpile, etc. would be like comparing apples and oranges.

> > > 5. Document metadata cannot be stored as a programmer wants, he
> must
> > > translate the object to a set of fields
> > Yes? I'd think that possibility of doing separate fields is a good
> thing; 
> > after all, all a plain text search engine needs to provide (to be
> considered 
> > one) is indexing of plain text data, right?
> I talked about metadata. When metadata object knows how to achieve
> its 
> persistence, why would one translate anything to fields and then
> back?
> Why would you touch the users metadata at all? You need flat fields
> for
> indexing, and what's around -- it is not your problem :). Lucene is
> something between CMS and CIS, you say that it's closer to CIS, but
> when
> you need metadata in fields, you are closer to CMS IMHO.

Not sure I follow.  I certainly don't think of Lucene as a CMS.  Just a
text indexing and searching library.

> > > 6. Lucene cannot implement your own dynamization
> > 
> > (sorry, I must sound real thick here).
> > Could you elaborate on this... what do you mean by dynamization?
> Read more about "Dynamization of Decomposable Searching Problems".


Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message