lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitri Bichko" <>
Subject Hardware recommendation
Date Fri, 09 Sep 2005 18:28:52 GMT

I'm putting together a cheap indexing server for an "explorative" lucene
project and had a few questions about which route to go.

I am going with a Socket 939 platform - does it make sense to get the
dual core Athlon 64 X2, or is it better to stick with a faster clocked
"plain" Athlon 64?

Also, would Lucene benefit from running in 64 bit mode, or does it
prefer "compatibility" 32 bit?

I figure most indexing apps will be heavily IO bound, so I am stressing
that, while staying with commodity components, so:

WD SATA disks (250GB, 16MB cache, SATAII 3Gb/s)
starting out with 4 of these (plus system disks), on the onboard
controller (RAID0)

If need be I can add two disk cages, 5 disks each with two decent SATA
RAID controllers (64/128MB cache, NCQ, that sort of thing); the nForce4
PCI-Express should stand up to this, I'm hoping.

And of course I am limited to 4GB RAM.

I have three main applications in mind:

Indexing PubMed/Medline article abstracts, this would we an index of
about 15 million records with a couple of identifier fields, a title and
a 1-3 paragraph abstract.  Mostly the searches will be keyword searches
on the text fields.  Potentially I could add full-length papers to this
as well (a lot fewer records though).

Second one is indexing a couple hundred thousand MS Office documents and
PDF files (Google Appliance sort of thing).

And finally a genetic database repository a la LuceGene, or SRS.  This
would have more complex records (ie many fields, but little data with
each), which are mostly retrieved on unique identifiers (very little
text searching).  This would probably run to a few tens of millions of
records, maybe around 100 million eventually.

Given these applications, what else should I be thinking about,

The information transmitted is intended only for the person or entity to which it is addressed
and may contain confidential and/or privileged material. Any review, retransmission, dissemination
or other use of, or taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received this in error, please
contact the sender and delete the material from any computer

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message