lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Takacs" <>
Subject RE: commercial websites powered by Lucene?
Date Tue, 24 Jun 2003 09:52:12 GMT
Hi Nader,

This thread is by far one of the best, and most practical.  It will only be
topped when someone provides benchmarks for a type directory of 3
million plus urls.  I would love to, but the whole JavaCC thing is a show


I noticed that search is a little slow.  What has been your experience?
Perhaps it was a bandwidth issue, but I'm living in a country with the
greatest internet connectivity and penetration in the world (South Korea),
so I don't think that is an issue on my end.

You have 500,000 resumes.  Based on the steps you took to get to 500,000, do
you think your current setup will scale to millions, like say, 3 million or

What is your hardware like?  CPU/RAM?

Warm regards, and thanks for sharing.  If I can ever get passed the
Lucene/JavaCC installation failure, I'll share my benchmarks on the above
directory scenario.


-----Original Message-----
From: Nader S. Henein []
Sent: Tuesday, June 24, 2003 5:30 PM
To: 'Lucene Users List'
Subject: RE: commercial websites powered by Lucene?

 I handle updates or inserts the same way first I delete the document
from the index and then I insert it (better safe than sorry), I batch my
updates/inserts every twenty minutes, I would do it in smaller intervals
but since I have to sync the XML files created from the DB to three
machines (I maintain three separate Lucene indices on my three separate
web-servers) it takes a little longer. You have to batch your changes
because Updating the index takes time as opposed to deleted which I
batch every two minutes. You won't have a problem updating the index and
searching at the same time because lucene updates the index on a
separate set of files and then when It's done it overwrites the old
version. I've had to provide for Backups, and things like server crashes
mid-indexing, but I was using Oracle Intermedia before and Lucene BLOWS

-----Original Message-----
From: news [] On Behalf Of Chris Miller
Sent: Tuesday, June 24, 2003 12:06 PM
Subject: Re: commercial websites powered by Lucene?

Hi Nader,

I was wondering if you'd mind me asking you a couple of questions about
your implementation?

The main thing I'm interested in is how you handle updates to Lucene's
index. I'd imagine you have a fairly high turnover of CVs and jobs, so
index updates must place a reasonable load on the CPU/disk. Do you keep
CVs and jobs in the same index or two different ones? And what is the
process you use to update the index(es) - do you batch-process updates
or do you handle them in real-time as changes are made?

Any insight you can offer would be much appreciated as I'm about to
implement something similar and am a little unsure of the best approach
to take. We need to be able to handle indexing about 60,000
documents/day, while allowing (many) searches to continue operating


"Nader S. Henein" <> wrote in message
> We use Lucene , we're basically an on-line
> Recruitment site and up until now we've got around 500 000 CVs and
> documents indexed with results that stump Oracle Intermedia.
> Nader Henein
> Senior Web Dev
> -----Original Message-----
> From: []
> Sent: Wednesday, June 04, 2003 6:09 PM
> To:
> Subject: commercial websites powered by Lucene?
> Hello All,
> I've been trying to find examples of large commercial websites that
> use Lucene to power their search.  Having such examples would make
> Lucene an easy sell to management
> Does anyone know of any good examples?  The bigger the better, and the

> more the better.
> TIA,
> -John
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message