crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <>
Subject [jira] [Created] (CRUNCH-267) Fix several HFileUtils#scanHFiles related problems
Date Wed, 18 Sep 2013 06:00:57 GMT
Chao Shi created CRUNCH-267:

             Summary: Fix several HFileUtils#scanHFiles related problems
                 Key: CRUNCH-267
             Project: Crunch
          Issue Type: Bug
            Reporter: Chao Shi

This patch fixes several problems about HFileUtils#scanHFiles that are discovered on our production

1. The usage of "" is wrong

Returning -1 indicating all KVs in the HFile is greater than the given key, so we should continue
to scan. So I replaced it with seekAtOrAfter, which is copied from HBase code, and added a
few tests (testScanFiles_startRow{IsTooSmall, IsTooLarge, DoesNotExist) to cover this.

2. The default implementation of HFileSource#getSize does not estimate correctly the size
of input, if the input HFiles are in sub-directory (i.e. input/family/hfile)

3. There are some tricky cases about Delete/DeleteColumn. I added some test cases and fix
related code. (Hopefully my test case can cover this.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message