I see. 0.89 is the still a developer release and I hear that it is not stable. But it sounds
really tempting because it boosts performance by a lot. Can I trust it if my final goal is
to insert about 100TB of data? What could be the possible issues? Also when shall I expect
to see a stable release?
So in order to do multiple-client insertion I basically just need to create multiple HTable
objects to handle the insertions? Do I need to do multi-threading manually?
And finally a possibly stupid question: how do I check what is the total number of regions?
Thanks. :)
On Aug 9, 2010, at 3:51 PM, Jean-Daniel Cryans wrote:
> HBASE-2066 was committed and it will be automatically in function when
> using the write buffer starting with version 0.89, eg this contains it
> http://hbase.apache.org/docs/r0.89.20100621/
>
> Using more than 1 clients is basically starting more of them, the same
> way you started the first one. Your input data can then be split
> between the clients, using either MapReduce or your homegrown solution
> (we imported the stumbles with a MR job).
>
> J-D
>
> On Mon, Aug 9, 2010 at 12:44 PM, Han Liu <hanl1@andrew.cmu.edu> wrote:
>> Thanks for your reply J-D!
>>
>> Could you explain more about the HBase-2066 schema? For example how did you do the
first 3 steps described on that page?
>>
>> Also is there any documentation that describes the multiple-client in HBase?
>>
>>
>> On Aug 9, 2010, at 2:37 PM, Jean-Daniel Cryans wrote:
>>
>>> That's pretty powerful machines, I would expect more performance. You
>>> could try using the same settings that we do here, checkout ryan's
>>> presentation, page 16:
>>> http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf
>>>
>>> Google "IO wait" to learn about it.
>>>
>>> Multi-clients will be faster unless you are already maxing out the
>>> machines (betting 100$ you're not), it's like asking if doing parallel
>>> processing will be faster than sequential processing.
>>>
>>> J-D
>>>
>>> On Mon, Aug 9, 2010 at 11:21 AM, Han Liu <hanl1@andrew.cmu.edu> wrote:
>>>> Thanks for the reply J-D.
>>>> In-lined. :)
>>>>
>>>>
>>>> On Aug 9, 2010, at 1:57 PM, Jean-Daniel Cryans wrote:
>>>>
>>>>> Hard to tell if it's decent performance. How do you define "decent"?
>>>> I consider it descent if it is roughly the best performance one can get using
my schema on my machines
>>>>> What kind of hardware are we talking about?
>>>> One machine for HBase master and 6 regionservers. Specs of each of these
machines:
>>>> 16 GB Ram
>>>> 4 1TB 7200RPM SATA Drives
>>>> 10 Gb Network: 1x Qlogic QLE3142-Cu-CK
>>>> CPU: 2x quad-core E5440 (2.83GHz, 12MB L2 cache, 1333 MHz FSB)
>>>>> Which version are you
>>>>> using?
>>>> 0.20.4
>>>>> How much memory was given to HBase?
>>>> 6 GB
>>>>>
>>>>> Also did you set the write buffer on the client side on HTable?
>>>> Yes I set it to be "1024*1024*12" bytes
>>>>> Did
>>>>> you also turn off auto-flushing?
>>>> Yes it's turned off
>>>>> Do you monitor your cluster? If so,
>>>>> do you see lots if IO wait?
>>>> I didn't.. What do IO waits indicate?
>>>>>
>>>>> And finally, do you use a single client or multiple ones?
>>>>>
>>>> Single client. Will multiple client boost performance?
>>>>> :)
>>>>>
>>>> :) :)
>>>>
>>>> Thanks a lot.
>>>>> J-D
>>>>>
>>>>> On Mon, Aug 9, 2010 at 10:46 AM, Han Liu <hanl1@andrew.cmu.edu>
wrote:
>>>>>> Hi Guys,
>>>>>>
>>>>>> I know on the client side of HBase there's a configuration" hbase.client.write.buffer".
I wonder if there's a similar configuration on the region server side that i can tweak to
adjust performance?
>>>>>>
>>>>>> Also as of right now I have managed to insert 15gb data to a 6-regionserver
HBase database in roughly 26 minutes using the "table.put(Put p)" schema. Generally is this
a decent performance?
>>>>>>
>>>>>> Any advice would be appreciated. Thanks a lot in advance.
>>>>>> --
>>>>>> Han Liu
>>>>>> SCS & HCI Institute
>>>>>> Undergrad. Class of 2012
>>>>>> Carnegie Mellon University
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Han Liu
>>>> SCS & HCI Institute
>>>> Undergrad. Class of 2012
>>>> Carnegie Mellon University
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>> --
>> Han Liu
>> SCS & HCI Institute
>> Undergrad. Class of 2012
>> Carnegie Mellon University
>>
>>
>>
>>
>>
>
--
Han Liu
SCS & HCI Institute
Undergrad. Class of 2012
Carnegie Mellon University
|