hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: Question about how queries are distributed
Date Mon, 04 Aug 2008 17:42:25 GMT
I have used thrift on some of my MR jobs in the past I started a thrift 
server on all the servers that was running MR task.

you can start it on any server as long as it has the config files and 
everything it needs to run.

Billy


"Leon Mergen" <leon@solatis.com> wrote in 
message news:5eaaef180808040625n1c343637x1a711c3ca27677ea@mail.gmail.com...
> Hello Jean-Daniel,
>
> Okay, thank you! Hypothetically, let's say that the Thrift server becomes 
> a
> scalability problem -- could more than 1 Thrift server be started, and
> possibly be pooled behind a load balancer ?
>
> If the Thrift server is simply an indirection to the actual hadoop API and
> implements the normal hadoop client, I don't see any reason why not, but 
> you
> never know. :-)
>
> Regards,
>
> Leon Mergen
>
> On Mon, Aug 4, 2008 at 3:05 PM, Jean-Daniel Cryans 
> <jdcryans@gmail.com>wrote:
>
>> Ah ok I see what you meant! Yes, the Thrift client communicates with a
>> Thrift server which is bundled with the Master, so the HBase client code
>> doesn't run on your local machine that queries HBase. So yes, there may 
>> be
>> a
>> scalability problem if many many clients queries at the same time. I 
>> don't
>> personnaly use Thrift a lot but it seems to me that if someone uses it in 
>> a
>> production environement with a big load, he/she should definitively start
>> the Thrift server on another machine (the same way the Master should not 
>> be
>> with the Hadoop Namenode).
>>
>> Thank you Leon for asking the question, I'm sure others may have learned
>> something.
>>
>> J-D
>>
>> On Mon, Aug 4, 2008 at 6:25 AM, Leon Mergen 
>> <leon@solatis.com> wrote:
>>
>> > Hello Jean-Daniel,
>> >
>> > Ok, thank you for your response. I was worried that maybe because when
>> > using
>> > Thrift, the client would have to do any communications with a Hbase
>> > regionserver through the master server -- while I still don't quite
>> > understand how it's solved with Thrift, as I understand it, the Thrift
>> > client code (as in, the code that I embed in my application) will not
>> query
>> > the master server "after it learns the location of the ROOT HRegion", 
>> > and
>> > from then will talk directly to the RegionServers, since the Thrift API
>> > actually fully implements the regular Java HBase client, even when
>> working
>> > from a language such as C++ ?
>> >
>> > I always thought Thrift was a simple way to serialize/unserialize data 
>> > in
>> > an
>> > efficient and platform independent manner, but sounds like it's more
>> > advanced, which is good. :-)
>> >
>> > Regards,
>> >
>> > Leon Mergen
>> >
>> >
>> > On Mon, Aug 4, 2008 at 3:56 AM, Jean-Daniel Cryans 
>> > <jdcryans@gmail.com
>> > >wrote:
>> >
>> > > Leon,
>> > >
>> > > The HBase Architecture page in the wiki does give this kind of
>> > information,
>> > > specifically here:
>> > > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#metadata and
>> since
>> > > HBase is a Bigtable clone, reading it's paper also gives useful
>> > > information:
>> > > http://labs.google.com/papers/bigtable.html
>> > >
>> > > To make it short, the client queries the .META. table to find the 
>> > > users
>> > > tables regions to which it puts and gets data. Thrift only acts a as
>> > > decorator on the Java HBase client.
>> > >
>> > > Until Zookeeper is integrated in HBase (like Chubby for Bigtable), 
>> > > the
>> > > Master is a SPOF but should not have any scalability-related problem.
>> > >
>> > > Hope this helps,
>> > >
>> > > J-D
>> > >
>> > > On Sun, Aug 3, 2008 at 7:22 PM, Leon Mergen 
>> > > <leon@solatis.com> wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > I'm looking for some information on hbase's architecture (out of 
>> > > > pure
>> > > > interest), which i wasn't able to find anything about it on the 
>> > > > Hbase
>> > > site
>> > > > (including the architecture description).
>> > > >
>> > > > Specifically, I am curious how writes/mutations are distributed
>> amongst
>> > > the
>> > > > servers, and whether this is different when using an interface like
>> > > Thrift.
>> > > > Is a server located for each mutateRow () operations "asked for" at
>> the
>> > > > master server, or is that cached at some level ? If not, how is 
>> > > > that
>> > > > problem
>> > > > solved that a client only connects to the master server but 
>> > > > actually
>> > > needs
>> > > > to talk to one of the slave servers ? Or is the master server a
>> single
>> > > weak
>> > > > spot that could introduce scalability problems on large (huge) 
>> > > > scale
>> ?
>> > > >
>> > > > Thanks in advance for any responses!
>> > > >
>> > > > Regards,
>> > > >
>> > > > Leon Mergen
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Leon Mergen
>> > http://www.solatis.com
>> >
>>
>
>
>
> -- 
> Leon Mergen
> http://www.solatis.com
> 



Mime
View raw message