hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ganesh Viswanathan <gan...@gmail.com>
Subject Re: Dropping a very large table - 75million rows
Date Thu, 09 Feb 2017 22:22:09 GMT
One additional question:

bq. "This wouldn't have affected your system's performance because the
locality for the table didn't change -- just the system-wide locality."

What does "locality for the table" mean and how is that different from
system-wide locality?
Do you mean other tables (in the system) could have lower locality, or do
you mean the drop in locality for the table did not affect system-wide RWs
because of replication (I have 3 for block repl in HDFS)?



On Thu, Feb 9, 2017 at 2:17 PM, Ganesh Viswanathan <gansvv@gmail.com> wrote:

> Thanks Ted/Josh.
>
> Ted-
> I store historical metrics on the locality of regions in each regionserver
> in the cluster. I noticed that the old table had many regions with low
> locality before the drop while the other newer tables had very few cases of
> low locality. After the drop, the new table's regions showed a large drop
> in locality.
>
> Josh-
> I did go from 2200regions before the drop to about 540regions. So,
> relatively speaking, yes it could be that the old table had more regions
> with overall higher locality (though it showed up as more regions in each
> regionserver with lower locality).
>
> At what point (in terms of put/get load and row count/storage size in
> HBase), does it make sense to dive into the data-locality settings and try
> to tune it so that compaction and locality changes are more predictable?
> Would looking at how to create regions (instead of auto-sharding) provide
> greater benefit?
>
> Thanks!
> Ganesh
>
>
> On Thu, Feb 9, 2017 at 11:51 AM, Josh Elser <elserj@apache.org> wrote:
>
>> It could be that the table you dropped had a very good locality while the
>> other tables had less. So, your overall locality went down (when the "good"
>> locality regions were no longer included). This wouldn't have affected your
>> system's performance because the locality for the table didn't change --
>> just the system-wide locality.
>>
>>
>> Ted Yu wrote:
>>
>>> bq. The locality of regions for OTHER tables on the same regionserver
>>> also
>>> fell drastically
>>>
>>> Can you be a bit more specific on how you came to the above conclusion ?
>>> Dropping one table shouldn't affect locality of other tables - unless
>>> number of regions on each server becomes unbalanced which triggers
>>> balancer
>>> activities.
>>>
>>> Thanks
>>>
>>> On Thu, Feb 9, 2017 at 7:34 AM, Ganesh Viswanathan<gansvv@gmail.com>
>>> wrote:
>>>
>>> So here is what I observed.
>>>> Dropping this large table had an immediate effect on average locality
>>>> for
>>>> the entire cluster. The locality of regions for OTHER tables on the same
>>>> regionserver also fell drastically in the cluster. This was unexpected
>>>> (I
>>>> only thought locality of regions for the dropped table would be
>>>> impacted).
>>>> Is this because of compaction? Does the locality computation use the
>>>> size
>>>> of other regions on each regionserver?
>>>>
>>>> The large drop in locality, however, did not cause latency issues on
>>>> read
>>>> writes for the other tables. Why is that? Is it because I did not try to
>>>> hit all low locality regions?
>>>>
>>>> (On another note, I was able to test and perform deletions on per region
>>>> basis, but that requires hbck -repair and it seemed more invasive on the
>>>> entire cluster health.)
>>>>
>>>> Thanks,
>>>> Ganesh
>>>>
>>>>
>>>> On Sat, Feb 4, 2017 at 11:20 AM Josh Elser<elserj@apache.org>  wrote:
>>>>
>>>> Ganesh,
>>>>>
>>>>> Just drop the table. You are worried about nothing.
>>>>>
>>>>> On Feb 3, 2017 16:51, "Ganesh Viswanathan"<gansvv@gmail.com>  wrote:
>>>>>
>>>>> Hello Josh-
>>>>>>
>>>>>> I am trying to delete the entire table and recover the disk space.
I
>>>>>> do
>>>>>>
>>>>> not
>>>>>
>>>>>> need to pick specific contents of the table (if thats what you are
>>>>>>
>>>>> asking
>>>>
>>>>> with #2).
>>>>>> My question is would disabling and dropping such a large table affect
>>>>>>
>>>>> data
>>>>>
>>>>>> locality in a bad way, or slow down the cluster when major_compaction
>>>>>>
>>>>> (or
>>>>
>>>>> whatever cleans up the tombstoned rows) happens. I also read from
>>>>>>
>>>>> another
>>>>
>>>>> post that it can spawn zookeeper transactions and even lock the
>>>>>>
>>>>> zookeeper
>>>>
>>>>> nodes. Is there any concern around zookeeper functionality when
>>>>>>
>>>>> dropping
>>>>
>>>>> large HBase tables.
>>>>>>
>>>>>> Thanks again for taking the time to respond to my questions!
>>>>>>
>>>>>> Ganesh
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 3, 2017 at 1:12 PM, Josh Elser<elserj@apache.org>
 wrote:
>>>>>>
>>>>>> Ganesh -- I was trying to get at maybe there is a terminology issue
>>>>>>>
>>>>>> here.
>>>>>
>>>>>> If you disable+drop the table, this is an operation on the order
of
>>>>>>>
>>>>>> Regions
>>>>>>
>>>>>>> you have. The number of rows/entries is irrelevant. Closing and
>>>>>>>
>>>>>> deleting
>>>>>
>>>>>> a
>>>>>>
>>>>>>> region is a relatively fast operation.
>>>>>>>
>>>>>>> Can you please confirm: are you trying to delete the entire table
or
>>>>>>>
>>>>>> are
>>>>>
>>>>>> you trying to delete the *contents* of a table?
>>>>>>>
>>>>>>> If it is the former, I stand by my "you're worried about nothing"
>>>>>>>
>>>>>> comment
>>>>>
>>>>>> :)
>>>>>>>
>>>>>>>
>>>>>>> Ganesh Viswanathan wrote:
>>>>>>>
>>>>>>> Thanks Josh.
>>>>>>>>
>>>>>>>> Also, I realized I didnt give the full size of the table.
It takes
>>>>>>>>
>>>>>>> in
>>>>
>>>>> ~75million rows per minute and stores for 15days. So around
>>>>>>>>
>>>>>>> 1.125billion
>>>>>
>>>>>> rows total.
>>>>>>>>
>>>>>>>> On Fri, Feb 3, 2017 at 12:52 PM, Josh Elser<elserj@apache.org>
>>>>>>>>
>>>>>>> wrote:
>>>>>
>>>>>> I think you are worried about nothing, Ganesh.
>>>>>>>>
>>>>>>>>> If you want to drop (delete) the entire table, just disable
and
>>>>>>>>>
>>>>>>>> drop
>>>>
>>>>> it
>>>>>
>>>>>> from the shell. This operation is not going to have a significant
>>>>>>>>>
>>>>>>>> impact
>>>>>>
>>>>>>> on
>>>>>>>>> your cluster (save a few flush'es). This would only happen
if you
>>>>>>>>>
>>>>>>>> have
>>>>>
>>>>>> had
>>>>>>>>> recent writes to this table (which seems unlikely if
you want to
>>>>>>>>>
>>>>>>>> drop
>>>>
>>>>> it).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ganesh Viswanathan wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>>> I need to drop an old HBase table that is quite large.
It has
>>>>>>>>>>
>>>>>>>>> anywhere
>>>>>
>>>>>> between 2million and 70million datapoints. I turned off the count
>>>>>>>>>>
>>>>>>>>> after
>>>>>>
>>>>>>> it
>>>>>>>>>> ran on the HBase shell for half a day. I have 4 other
tables that
>>>>>>>>>>
>>>>>>>>> have
>>>>>
>>>>>> around 75million rows in total and also take heavy PUT and GET
>>>>>>>>>>
>>>>>>>>> traffic.
>>>>>>
>>>>>>> What is the best practice for disabling and dropping such a large
>>>>>>>>>>
>>>>>>>>> table
>>>>>>
>>>>>>> in
>>>>>>>>>> HBase so that I have minimal impact on the rest of
the cluster?
>>>>>>>>>> 1) I hear there are ways to disable (and drop?) specific
regions?
>>>>>>>>>>
>>>>>>>>> Would
>>>>>>
>>>>>>> that work?
>>>>>>>>>> 2) Should I scan and delete a few rows at a time
until the size
>>>>>>>>>>
>>>>>>>>> becomes
>>>>>>
>>>>>>> manageable and then disable/drop the table?
>>>>>>>>>>      If so, what is a good number of rows to delete
at a time,
>>>>>>>>>>
>>>>>>>>> should I
>>>>>
>>>>>> run
>>>>>>>>>> a
>>>>>>>>>> major compaction after these row deletes on specific
regions, and
>>>>>>>>>>
>>>>>>>>> what
>>>>>
>>>>>> is
>>>>>>>>>> a
>>>>>>>>>> good sized table that can be easily dropped (and
has been
>>>>>>>>>>
>>>>>>>>> validated)
>>>>
>>>>> without causing issues on the larger cluster.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>> Ganesh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message