hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: Map File index bug?
Date Thu, 06 Nov 2008 07:52:37 GMT
Looks like the skip and indexInterval is setup to read correctly I did not 
understand that there was to conf options one for write and read.

The index files are block compressed Stack was not sure but found the code 
there always compressed.
I ran a job on 80 mapfile to write 80 new file with non compressed indexes 
and still took ~4X the memory of the sizes of the uncompressed index files 
to load in to memory
could have to do with the way they grow the arrays storing the pos of the 
keys starting on line 333
Looks like they are copying arrays and making a new one 150% bigger then the 
last as needed.
not sure about java and how long before the old array will be recovered from 
I have seen a few times recover do to about ~2x the size of the uncompressed 
index files but only twice.

I am testing by creating the files with a MR job and then loading the map 
files in a simple
program to open the files and find midkey so the index gets read in to 
memory and watching top command
also added -Xloggc:/tmp/gc.log and watch the memory usage go up and it 
matches for the most part with top.

I tried running System.gc() to force a clean up of the memory but did not 
seam to help any.


"Billy Pearson" <sales@pearsonwholesale.com> 
wrote in message news:ger0jq$800$1@ger.gmane.org...
>I been looking over the MapFile class on hadoop for memory problems and 
>thank I might have found an index bug
> org.apache.hadoop.io.MapFile
> line 202
> if (size % indexInterval == 0) {            // add an index entry
> this is where its writing the index and skipping every indexInterval rows
> then on the loading of the index
> line 335
>          if (skip > 0) {
>            skip--;
>            continue;                             // skip this entry
> we are only reading in every skip entry
> so with the default of 32 I thank in hbase we are only writing a index to 
> the indexfile every 32 rows and then only reading back every 32 rows of 
> that
> so we only get a index row every 1024 rows.
> Take a look and confirm and we can open a bug on hadoop about it.
> Billy

View raw message