lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "v k" <>
Subject Infrastructure question
Date Tue, 18 Dec 2007 05:03:21 GMT

I am using Lucene to build an index from roughly  10 million documents
in number. The  documents are about 4 TB in total.

After some trial runs, indexing a subset of the documents I am trying
to figure out a hosting service configuration to create a full index
from the entire 10 TB of data. As I am still unsure how this project
will turn out I am not purchasing hardware/ram but considering a web
for the purpose of :
1)  download the data and to start indexing it.
2) The web front end to access this index will be a python framework (
eg. Django  etc)

I am seriously contemplating signing up with Joyent for this plan:
AMD Opteron x64 multi-core servers with 4GiB RAM per core
1/16 (Burstable up to 95%)
1 TB	- Bandwidth/month, 1 GB RAM, + as such as NAS  storage as I can
afford to pay for.

My QUESTION is - Will this RAM and CPU be sufficient during
development of the search application and building the index, etc. or
is it so abysmal and under-equipped in terms of hardware that the
development version of my application will not work.
I understand that having more RAM is always good, but is 1GB as good as nothing?

This setup is NOT for production but for for development so I can get
my hands dirty with lucene which will require plenty of tweaks as the
project moves along.

What initial configuration would you recommend for a development
version given the corpus size. I am not even sure how large my index
will look like at this point.

I hope to build an my indexes this way and once the search
infrastructure is working and the web-front end complete, I plan to
worry about Redundancy, availability and scalability for the many
users I hope to provide this free service for :-)

Many of you in this forum have built successful products with Lucene.
To name a few I am aware of -  Ken Krugle, James Ryley, Dennis Kubes

Some of you must have started with small machines,test set-ups etc
where you built your initial search apps. I hope  to receive some
advise about my plan and approach to start building an infrastructure
to support my Lucene app.

Thank you.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message