lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russ" <>
Subject Re: lucene scalability questions
Date Thu, 04 Jan 2007 22:40:40 GMT
If you do this on windows, you might be able to replicate the indexes using DFS.  On linux
you can probably use rsync to keep the different servers up to date.

If the size of the index is an issue, lustre could be used to have one volume that's spread
over many servers.  Performance is supposed to be good with lustre as well.

If you want to speed up individual queries when searching a large index, you can probably
split up the index in some way among the servers, query them all at the same time and then
aggregate the results.  This is just an idea, but I believe it was mentioned in "lucene in

Sent wirelessly via BlackBerry from T-Mobile.  

-----Original Message-----
From: "Peter W." <>
Date: Thu, 4 Jan 2007 14:02:00
Subject: Re: lucene scalability questions


My understanding of Lucene is limited, but the issues
seem similar to web server farms in that it comes down to
linear scalability by adding more boxes.

This means separate machines with their own indexes.

Shared filesystems such as NFS work well in smaller environments
but experience problems with heavy load (lost mounts req. reboots).

There's no mysql-like 'replication' with masters using
binary files to update slaves. However, since the index is
file based, you can close Indexwriters and make hot copies or
perform backups for redundancy.

If you know XML, use Solr to post and retrieve documents to and from
your various Lucene indexes. It hides the complexity of remote
object brokering such as RMI.

Solr also allows you to get result sets using JSON so you could
provide distributed Lucene results to browsers as a .js widget.

While not reflecting the latest 2.0 version release the Lucene in Action
book provides good background on combining separate indexes.


Peter W.

On Jan 4, 2007, at 7:51 AM, Mark Mei wrote:

> So this question has two parts:
> 1. How does Lucene scale, exactly? Do we distribute the index to  
> multiple
> servers somehow? Or is it one index, sitting on some sort of a shared
> filesystem, shared by all Lucene servers? If it's the latter, the  
> bottleneck
> will be I/O ... anyway, elaborate on scalability please, and how  
> you set it
> up
> 2. High availability. How would one go about making Lucene redundant?

To unsubscribe, e-mail:
For additional commands, e-mail:
View raw message