hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: Map File index bug?
Date Thu, 06 Nov 2008 15:55:31 GMT
There is no method to change the compression of the index its just always 
block compressed.
I hacked the code and and changed to non compressed so I could get a size of 
the index with out compression.
Opening the all 80 mapfiles took 4x the memory then there uncompressed size 
of all the index files.


"stack" <stack@duboce.net> wrote in message 
news:7c962aed0811060026j660e4d87hfe3fc0ce7895ff7e@mail.gmail.com...
> On Wed, Nov 5, 2008 at 11:52 PM, Billy Pearson
> <sales@pearsonwholesale.com>wrote:
>>
>>
>> I ran a job on 80 mapfile to write 80 new file with non compressed 
>> indexes
>> and still took ~4X the memory of the sizes of the uncompressed index 
>> files
>> to load in to memory
>
>
> Sorry Billy, how did you specify non-compressed indices?  What took 4X
> memory?  The non-compressed index?
>
>
> could have to do with the way they grow the arrays storing the pos of the
>> keys starting on line 333
>> Looks like they are copying arrays and making a new one 150% bigger then
>> the last as needed.
>> not sure about java and how long before the old array will be recovered
>> from memory.
>
> I have seen a few times recover do to about ~2x the size of the 
> uncompressed
>> index files but only twice.
>>
>
>
> Unreferenced java objects will be let go variously.  Depends  on your JVM
> configuration.  Usually they'll be let go when JVM needs the memory (Links
> like this may be of help:
> http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#par_gc.oom
> )
>
>
>
>> I am testing by creating the files with a MR job and then loading the map
>> files in a simple
>> program to open the files and find midkey so the index gets read in to
>> memory and watching top command
>> also added -Xloggc:/tmp/gc.log and watch the memory usage go up and it
>> matches for the most part with top.
>>
>> I tried running System.gc() to force a clean up of the memory but did not
>> seam to help any.
>>
>
> Yeah, its just a suggestion.  The gc.log should give you better clue of
> whats going on.  Whats it saying?  Lots of small gcs and then a Fulll gc
> every so often?  Is the heap discernibly growing?  You could enable the 
> JMX
> for the JVM and connect with jconsole.  This can give you a more detailed
> picture on heap.
>
> St.Ack
> P.S. Check out HBASE-722 if you have a sec.
>
>
>
>> Billy
>>
>>
>> "Billy Pearson" <sales@pearsonwholesale.com> 
>> wrote in message
>> news:ger0jq$800$1@ger.gmane.org...
>>
>>  I been looking over the MapFile class on hadoop for memory problems and
>>> thank I might have found an index bug
>>>
>>> org.apache.hadoop.io.MapFile
>>> line 202
>>> if (size % indexInterval == 0) {            // add an index entry
>>>
>>> this is where its writing the index and skipping every indexInterval 
>>> rows
>>>
>>> then on the loading of the index
>>> line 335
>>>
>>>         if (skip > 0) {
>>>           skip--;
>>>           continue;                             // skip this entry
>>>
>>> we are only reading in every skip entry
>>>
>>> so with the default of 32 I thank in hbase we are only writing a index 
>>> to
>>> the indexfile every 32 rows and then only reading back every 32 rows of 
>>> that
>>>
>>> so we only get a index row every 1024 rows.
>>>
>>> Take a look and confirm and we can open a bug on hadoop about it.
>>>
>>> Billy
>>>
>>>
>>>
>>
>>
> 



Mime
View raw message