hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: About test/production server configuration
Date Tue, 06 Apr 2010 22:19:32 GMT
Or if you have a budget in mind, we can help you determine what would be the best way to allocate
those dollars.

> -----Original Message-----
> From: Jonathan Gray [mailto:jgray@facebook.com]
> Sent: Tuesday, April 06, 2010 3:11 PM
> To: hbase-user@hadoop.apache.org
> Subject: RE: About test/production server configuration
> 
> Imran,
> 
> Have you run Solr atop HDFS?  I doubt this will be performant.
> 
> Also, to properly scope your cluster, you need to come up with actual
> number targets if you want to be able to accurately provision hardware.
> "not much" data now, but "lots" of data later could mean anything.
> Decide what you want to provision for and then you can accurately do
> so.
> 
> > -----Original Message-----
> > From: Imran M Yousuf [mailto:imyousuf@gmail.com]
> > Sent: Monday, April 05, 2010 6:11 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: About test/production server configuration
> >
> > On Mon, Apr 5, 2010 at 11:56 PM, Jonathan Gray <jgray@facebook.com>
> > wrote:
> > > Imran,
> > >
> > > It's impossible to give good advice on cluster size and hardware
> > configuration without some idea of the requirements.
> > >
> >
> > Sorry my mistake, I should have elaborated a little bit more. Please
> > find some requirements below inline.
> >
> > > How much data?
> >
> > To startup, initially we will not have much, but down the road it
> will
> > be a lot of data. Plus a lot of user created content. Initially there
> > will be a lot of log-like entries, plus transactions...
> >
> > > How will the data be queried?
> >
> > We are focusing on system design to lookup using key only. For
> > searching it will be Solr only. So the idea is Solr will be used for
> > all searching and then the data lookup will be performed in HBase. In
> > addition, we will have both Application layer caching in Ehcache and
> > Web Accelerator (Varnish).
> >
> > > What kind of load do you expect?
> >
> > Hard to estimate but we are planning for moderate installation, so
> > that if we have good response from the market we can expand, thats
> one
> > of the 2 primary reason to choose HBase we will be able to scale it
> on
> > the fly.
> >
> > > You are going to be doing offline batch/MapReduce, online random
> > access, as well as search all from the same nodes?  This can be
> > dangerous.
> >
> > Yes the offline batch, HBase lookup will be on the same machine. But
> > not search as a whole...Solr will use HDFS only to store the index
> and
> > read it from, but no processing related to search will be done there.
> > It will be on a separate box all together. But your following
> > statement is tempting me to use RAID+DRDB for Solr based searching.
> >
> > Another thing to note is, the offline batch work would be summarize
> > tables. One example from our system would be to generate daily
> balance
> > sheet of ledgers, profit loss statement etc. for 100+ Journals in a
> > PoS SaaS.
> >
> > > I would strongly recommend against putting Hadoop+HBase on the same
> > nodes as something like Solr, unless you have dedicated disks for
> each.
> >  Also, don't forget about ZooKeeper which you definitely will need
> > separate nodes/disks for if you will be co-locating so many other
> > things.
> >
> > Hmm.. What I understand from this discussion and Patrick's point on
> > ZK. I would go for:
> >
> >  - 4 separate DN (each DN with its dedicated disk but may be not
> > physical server) for Solr only, as a side note, initially we will
> have
> > 2 Solr search boxes.
> >  - ZK needs separate disk for performance, so would have dedicated
> > disks for it too.
> >
> > But what I am confused about is how spread out ZK, Multi-Master, RS,
> > DN, TT for HBase. Insight, comments, suggestions on it would be most
> > welcome.
> >
> > Another note on our perspective is that we want to scale horizontally
> > by adding more machines and not vertically (if we wanted it or could
> > afford it, we would have probably chosen a RDBMS). Being able to
> scale
> > horizontally as our user-base, load and revenue increases was/is
> > essential to us.
> >
> > Waiting eagerly for some insight, comments and/or suggestions.
> >
> > Thank you.
> >
> > Imran
> >
> > >
> > > JG
> > >
> > >> -----Original Message-----
> > >> From: Imran M Yousuf [mailto:imyousuf@gmail.com]
> > >> Sent: Monday, April 05, 2010 9:52 AM
> > >> To: hbase-user@hadoop.apache.org
> > >> Subject: About test/production server configuration
> > >>
> > >> Hi,
> > >>
> > >> We are a startup who have decided to use HBase purely because we
> > want
> > >> to take advantage of HDFS based reliability, redundancy, MapReduce
> > and
> > >> BigTable. For that we are thinking to go for a test environment
> with
> > 5
> > >> servers and production environment with 10 servers in both case
> the
> > >> Hadoop cluster will be used for HBase + MapReduce + Solr Index.
> > >>
> > >> Firstly, I would like some opinion on whether 10 servers is a good
> > >> number for all 3 purposes or not. Secondly what kind of test
> > >> environment is currently in use in different organizations.
> Thirdly,
> > I
> > >> would like to learn some server configuration and purchase price
> > (with
> > >> purchase location if possible).
> > >>
> > >> Waiting eagerly for some feedback.
> > >>
> > >> Thank you,
> > >>
> > >> --
> > >> Imran M Yousuf
> > >> Entrepreneur & Software Engineer
> > >> Smart IT Engineering
> > >> Dhaka, Bangladesh
> > >> Email: imran@smartitengineering.com
> > >> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> > >> Mobile: +880-1711402557
> > >
> >
> >
> >
> > --
> > Imran M Yousuf
> > Entrepreneur & Software Engineer
> > Smart IT Engineering
> > Dhaka, Bangladesh
> > Email: imran@smartitengineering.com
> > Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> > Mobile: +880-1711402557

Mime
View raw message