lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Infrastructure Question
Date Tue, 18 Dec 2007 16:43:43 GMT
Hi Venkat,

There is no need to post your question multiple times or cross-post.   
People are distributed all around the world on this list and aren't  
always available or capable to answer your question.  Having to wait  
11 hours for an answer on a free mailing list is not at all  

If you are just looking to get your hands dirty with Lucene, why not  
just start w/ a subset on a machine you already own and work to scale  
up?  This way, you could start with what you have available and get a  
feel for your memory usage, etc.  Then you will be in a better  
position to decide what your needs are.

If there is one thing that is true about search it is the fact that  
everyone's situation is different.


On Dec 18, 2007, at 11:21 AM, v k wrote:

> Hello,
> I am using Lucene to build an index from roughly  10 million documents
> in number. The  documents are about 4 TB in total.
> After some trial runs, indexing a subset of the documents I am trying
> to figure out a hosting service configuration to create a full index
> from the entire 10 TB of data. As I am still unsure how this project
> will turn out I am not purchasing hardware/ram but considering a web
> host.
> for the purpose of :
> 1)  download the data and to start indexing it.
> 2) The web front end to access this index will be a python framework (
> eg. Django  etc)
> I am seriously contemplating signing up with Joyent for this plan:
> AMD Opteron x64 multi-core servers with 4GiB RAM per core
> 1/16 (Burstable up to 95%)
> 1 TB    - Bandwidth/month, 1 GB RAM, + as such as NAS  storage as I  
> can
> afford to pay for.
> My QUESTION is - Will this RAM and CPU be sufficient during
> development of the search application and building the index, etc. or
> is it so abysmal and under-equipped in terms of hardware that the
> development version of my application will not work.
> I understand that having more RAM is always good, but is 1GB as good  
> as nothing?
> This setup is NOT for production but for for development so I can get
> my hands dirty with lucene which will require plenty of tweaks as the
> project moves along.
> What initial configuration would you recommend for a development
> version given the corpus size. I am not even sure how large my index
> will look like at this point.
> I hope to build an my indexes this way and once the search
> infrastructure is working and the web-front end complete, I plan to
> worry about Redundancy, availability and scalability for the many
> users I hope to provide this free service for :-)
> Many of you in this forum have built successful products with Lucene.
> To name a few I am aware of -  Ken Krugle, James Ryley, Dennis Kubes
> Some of you must have started with small machines,test set-ups etc
> where you built your initial search apps. I hope  to receive some
> advise about my plan and approach to start building an infrastructure
> to support my Lucene app.
> Thank you.
> Venkat
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll

Lucene Helpful Hints:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message