hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Map File index bug?
Date Thu, 06 Nov 2008 08:26:39 GMT
On Wed, Nov 5, 2008 at 11:52 PM, Billy Pearson
> I ran a job on 80 mapfile to write 80 new file with non compressed indexes
> and still took ~4X the memory of the sizes of the uncompressed index files
> to load in to memory

Sorry Billy, how did you specify non-compressed indices?  What took 4X
memory?  The non-compressed index?

could have to do with the way they grow the arrays storing the pos of the
> keys starting on line 333
> Looks like they are copying arrays and making a new one 150% bigger then
> the last as needed.
> not sure about java and how long before the old array will be recovered
> from memory.

I have seen a few times recover do to about ~2x the size of the uncompressed
> index files but only twice.

Unreferenced java objects will be let go variously.  Depends  on your JVM
configuration.  Usually they'll be let go when JVM needs the memory (Links
like this may be of help:

> I am testing by creating the files with a MR job and then loading the map
> files in a simple
> program to open the files and find midkey so the index gets read in to
> memory and watching top command
> also added -Xloggc:/tmp/gc.log and watch the memory usage go up and it
> matches for the most part with top.
> I tried running System.gc() to force a clean up of the memory but did not
> seam to help any.

Yeah, its just a suggestion.  The gc.log should give you better clue of
whats going on.  Whats it saying?  Lots of small gcs and then a Fulll gc
every so often?  Is the heap discernibly growing?  You could enable the JMX
for the JVM and connect with jconsole.  This can give you a more detailed
picture on heap.

P.S. Check out HBASE-722 if you have a sec.

> Billy
> "Billy Pearson" <sales@pearsonwholesale.com> wrote in message
> news:ger0jq$800$1@ger.gmane.org...
>  I been looking over the MapFile class on hadoop for memory problems and
>> thank I might have found an index bug
>> org.apache.hadoop.io.MapFile
>> line 202
>> if (size % indexInterval == 0) {            // add an index entry
>> this is where its writing the index and skipping every indexInterval rows
>> then on the loading of the index
>> line 335
>>         if (skip > 0) {
>>           skip--;
>>           continue;                             // skip this entry
>> we are only reading in every skip entry
>> so with the default of 32 I thank in hbase we are only writing a index to
>> the indexfile every 32 rows and then only reading back every 32 rows of that
>> so we only get a index row every 1024 rows.
>> Take a look and confirm and we can open a bug on hadoop about it.
>> Billy

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message