lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Does lucene support distributed indexing?
Date Sun, 27 Apr 2008 16:25:54 GMT
There are actually several distributed indexing or searching projects in Lucene (the top-level
ASF Lucene project, not Lucene Java), and it's time to start thinking about the possibility
of bringing them together, finding commonalities, etc.

Here is the summary:
- Lucene - distributed search via ParallelMultiSearcher.  How you split indices/shards is
up to you.
- Solr - distributed indexing via SOLR-303 (see DistributedSearch on its Wiki).  How you split
indices/shards is up to you.
- Nutch - see its org.apache.nutch.ipc (I think).  How you split indices/segments is up to
you.
- Nutch - see the bottom of http://wiki.apache.org/nutch/Nutch2Architecture

There is also Hadoop:
- Using MapReduce + HDFS to build a single Lucene index in a distributed fashion (see contrib/
in Hadoop)

There is also GridLucene project somewhere on the web...

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: Grant Ingersoll <gsingers@apache.org>
> To: java-user@lucene.apache.org
> Sent: Saturday, April 26, 2008 4:20:19 PM
> Subject: Re: Does lucene support distributed indexing?
> 
> 
> On Apr 26, 2008, at 2:33 AM, Samuel Guo wrote:
> 
> > Hi all,
> >
> > I am a lucene newbie:)
> >
> > It seems that lucene doesn't support distributed indexing:(
> > As some IR research papers mentioned, when the documents collection  
> > become
> > large, the index will be large also. When one single machine can't  
> > hold all
> > the index, some strategies are used to solve it. such as that we can  
> > part
> > the whole collection into several small sub-collections. According to
> > different partitions, we can got different strategies : document- 
> > partittion
> > and term-partition. but I don't know why not lucene support these  
> > ways:(
> > can't anyone explain it ?
> 
> Because no one has donated the code to do it.  You can do distributed  
> indexing via Nutch and some (albeit non fault tolerant) distributed  
> Search in Lucene.  Solr also now has distributed search.
> 
> -Grant
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message