lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <>
Subject Re: Infrastructure Question
Date Sat, 22 Dec 2007 16:22:26 GMT
When we started we had 5 machines that were 800mghz and maybe 512M to 1G 
  of ram.  It was enough to get started and start testing things 
although I wouldn't recommend that setup because looking back I don't 
think it was enough.  And of course we started getting OutOfMemory 
errors pretty quickly as our data grew.

Remember that in search the serving is the hardware intensive part.  For 
getting your hands dirty and processing the data, the hardware you 
propose should be more than sufficient.  Amazon's EC2, especially the 
large and extra-large instances would also work very well for this and 
give you the opportunity to grow your serving computer if/when needed.

Dennis Kubes

v k wrote:
> Sorry about that. For some reason, my post did not show up in the
> mailing list and I still cannot see it  ( maybe a settings issue). I
> don't mean to barrage the mailing  list with the same question. Thanks
> for the advise.
> On Dec 18, 2007 11:43 AM, Grant Ingersoll <> wrote:
>> Hi Venkat,
>> There is no need to post your question multiple times or cross-post.
>> People are distributed all around the world on this list and aren't
>> always available or capable to answer your question.  Having to wait
>> 11 hours for an answer on a free mailing list is not at all
>> unreasonable.
>> If you are just looking to get your hands dirty with Lucene, why not
>> just start w/ a subset on a machine you already own and work to scale
>> up?  This way, you could start with what you have available and get a
>> feel for your memory usage, etc.  Then you will be in a better
>> position to decide what your needs are.
>> If there is one thing that is true about search it is the fact that
>> everyone's situation is different.
>> Cheers,
>> Grant
>> On Dec 18, 2007, at 11:21 AM, v k wrote:
>>> Hello,
>>> I am using Lucene to build an index from roughly  10 million documents
>>> in number. The  documents are about 4 TB in total.
>>> After some trial runs, indexing a subset of the documents I am trying
>>> to figure out a hosting service configuration to create a full index
>>> from the entire 10 TB of data. As I am still unsure how this project
>>> will turn out I am not purchasing hardware/ram but considering a web
>>> host.
>>> for the purpose of :
>>> 1)  download the data and to start indexing it.
>>> 2) The web front end to access this index will be a python framework (
>>> eg. Django  etc)
>>> I am seriously contemplating signing up with Joyent for this plan:
>>> AMD Opteron x64 multi-core servers with 4GiB RAM per core
>>> 1/16 (Burstable up to 95%)
>>> 1 TB    - Bandwidth/month, 1 GB RAM, + as such as NAS  storage as I
>>> can
>>> afford to pay for.
>>> My QUESTION is - Will this RAM and CPU be sufficient during
>>> development of the search application and building the index, etc. or
>>> is it so abysmal and under-equipped in terms of hardware that the
>>> development version of my application will not work.
>>> I understand that having more RAM is always good, but is 1GB as good
>>> as nothing?
>>> This setup is NOT for production but for for development so I can get
>>> my hands dirty with lucene which will require plenty of tweaks as the
>>> project moves along.
>>> What initial configuration would you recommend for a development
>>> version given the corpus size. I am not even sure how large my index
>>> will look like at this point.
>>> I hope to build an my indexes this way and once the search
>>> infrastructure is working and the web-front end complete, I plan to
>>> worry about Redundancy, availability and scalability for the many
>>> users I hope to provide this free service for :-)
>>> Many of you in this forum have built successful products with Lucene.
>>> To name a few I am aware of -  Ken Krugle, James Ryley, Dennis Kubes
>>> Some of you must have started with small machines,test set-ups etc
>>> where you built your initial search apps. I hope  to receive some
>>> advise about my plan and approach to start building an infrastructure
>>> to support my Lucene app.
>>> Thank you.
>>> Venkat
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> --------------------------
>> Grant Ingersoll
>> Lucene Helpful Hints:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message