hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brush,Ryan" <RBR...@CERNER.COM>
Subject Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?
Date Thu, 06 Oct 2011 15:49:40 GMT
No, if you have only 10 versions of a cell there is no additional overhead
to the maxVersion being 20 or 10,000. There shouldn't be a penalty for
setting max versions arbitrarily large as long as your number of actual,
physical versions of a row is less than that.


Max versions on the column family is used during compactions to restrict
the number of copies carried forward during the compaction.  So if your
value is greater than the number of actual versions, the behavior is
unchanged no matter how big it is.  Similarly with get or scan operations,
if the value you request is much larger than the actual values, it has no
additional overhead.

Of course, if your number of actual versions exceeds some maxVersion
value, that has the semantics you'd expect. But if you really want to keep
all versions, there is no cost to setting the value arbitrarily high.

(There is a catch in that a single row can never span beyond a single
region, so if you have a _lot_ of versions in a row this could have
implications, but this same issue applies if you have a single "wide" row
with a huge amount of data in the columns.)

On 10/6/11 8:52 AM, "Micah Whitacre" <mkwhitacre@gmail.com> wrote:

>Are there any negative performance aspects to setting the max versions
>to a large value if those extra stored versions are not used?  If I
>set the max to 10k but really only store 100, there is not extra
>diskspace/memory being consumed by the potential of having more
>versions is there?  Also what about the inverse of writing, gets?  If
>my "Gets" all call get.setMaxVersion() does setting that value to
>being extremely despite there not being versions cause performance
>problems?
>
>Thanks for the help,
>Micah
>
>On Tue, Oct 4, 2011 at 10:52 PM, lars hofhansl <lhofhansl@yahoo.com>
>wrote:
>> MaxVersions and MinVersions are different features.
>> MaxVersion identifies the max number of versions you want to keep (just
>>to state the obvious).
>> MinVersion is used together with TTL (and soon with deletes - see
>>HBASE-4536), to indicate the minimum number of version you want keep
>>around even when they should be expired or were deleted.
>>
>>
>> There is no way to disable MaxVersions, just set it to a very large
>>number.
>>
>> MinVersions is by default disabled (setting is 0), which means rows
>>past their TTL and deleted rows will be removed during compaction.
>>
>>
>> I am thinking about how to state that more clearly in the documentation.
>>
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>> From: Doug Meil <doug.meil@explorysmedical.com>
>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>> Sent: Tuesday, October 4, 2011 4:32 PM
>> Subject: Re: Allowed upper limit to HColumnDescriptor.setMaxVersion(..)?
>>
>>
>> The default for versioning is 3, unfortunately the sub-section also
>>cites
>> (incorrectly) that the min is 0.  That sub-section is trying to indicate
>> the minimum legal value.  I am working on clearing that entry up with
>> another developer.
>>
>>
>>
>>
>>
>> On 10/4/11 6:04 PM, "Micah Whitacre" <mkwhitacre@gmail.com> wrote:
>>
>>>Are you surmising that from the description of setting a minimum
>>>version?
>>>
>>>On Tue, Oct 4, 2011 at 2:31 PM, Doug Meil
>>><doug.meil@explorysmedical.com>
>>>wrote:
>>>>
>>>> http://hbase.apache.org/book.html#schema.versions
>>>>
>>>>
>>>> I believe if you set that to 0 it should disable the versioning.
>>>>
>>>>
>>>>
>>>> On 10/4/11 2:21 PM, "Micah Whitacre" <mkwhitacre@gmail.com> wrote:
>>>>
>>>>>I guess what I'm asking is there a way to set "infinite" or no max
>>>>>bounds on versions (e.g. setMaxVersion(-1) possibly)?  Or do I have to
>>>>>call setMaxVersion(Integer.MAX_VALUE) or setMaxVersion(<some large
>>>>>guess>)?  If a large guess is the way to go, what sort of overhead
>>>>>costs might we need to consider when finding the right balance point
>>>>>between room to grow and the maintenance support cost of needing to
>>>>>expand later?
>>>>>
>>>>>We plan on building MapReduce jobs to clean up versions based on some
>>>>>conditions so the value shouldn't get that large but the conditions
>>>>>for cleaning up those versions might be decided by other consumers of
>>>>>the service.  So having room to grow is ideal.
>>>>>
>>>>>On Tue, Oct 4, 2011 at 11:36 AM, Doug Meil
>>>>><doug.meil@explorysmedical.com> wrote:
>>>>>>
>>>>>> Hi there-
>>>>>>
>>>>>> re:  "i don't care store them all"
>>>>>>
>>>>>>
>>>>>> What do you mean?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/4/11 12:20 PM, "Micah Whitacre" <mkwhitacre@gmail.com>
wrote:
>>>>>>
>>>>>>>In reading the documentation all I've seen suggestions on how
to set
>>>>>>>the value and the default value.  However I haven't seen any
>>>>>>>indication how to set the value to "i don't care store them all"
or
>>>>>>>if
>>>>>>>there is a maximum bounds aside from Integer.MAX_VALUE.  Does
anyone
>>>>>>>know?
>>>>>>>
>>>>>>>Thanks,
>>>>>>>Micah
>>>>>>>
>>>>>>>[1] -
>>>>>>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescr
>>>>>>>ip
>>>>>>>to
>>>>>>>r.
>>>>>>>html#setMaxVersions(int)
>>>>>>
>>>>>>
>>>>
>>>>

----------------------------------------------------------------------
CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation
and are intended only for the addressee. The information contained in this message is confidential
and may constitute inside or non-public information under international, federal, or state
securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such
information is strictly prohibited and may be unlawful. If you are not the addressee, please
promptly delete this message and notify the sender of the delivery error by e-mail or you
may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.

Mime
View raw message