crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-619) Run on HBase 2
Date Fri, 09 Sep 2016 15:42:20 GMT


Tom White commented on CRUNCH-619:

Thanks for taking a look, [~jmhsieh].

There seem to be some APIs that don't exist in both HBase 1 and 2, e.g. CellUtil#createFirstOnRow,
and CellComparator#COMPARATOR. Are these going to be backported to HBase 1 to make the transition

There's a comment in HFileOutputFormatForCrunch that explains why the HBase equivalent is
not used. I guess that still applies.

HBase's official HFileOutputFormat is not used, because it shuffles on row-key only and
does in-memory sort at reducer side (so the size of output HFile is limited to reducer's memory).
As crunch supports more complex and flexible MapReduce pipeline, we would prefer thin and
OutputFormat here.

No reviewboard for Crunch, I'm afraid :(

> Run on HBase 2
> --------------
>                 Key: CRUNCH-619
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>    Affects Versions: 0.14.0
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: CRUNCH-619.patch

This message was sent by Atlassian JIRA

View raw message