Maybe we need something based on this?
https://issues.apache.org/jira/browse/HBASE-3996
On Mon, Apr 8, 2013 at 1:41 PM, Chad Urso McDaniel <chadum@gmail.com> wrote:
> This may be a core hadoop question.
>
> We are using Crunch with HBase.
> We typically set up the input PTable like so:
> ---
> Scan scan = ...
> HBaseSourceTarget source = new HBaseSourceTarget(tableName, scan);
> PTable<ImmutableBytesWritable, Result> data = pipeline.read(source);
> ---
>
> A use case that we want to use in order to speed up the processing with
> Crunch is using multiple Scans into one PTable.
>
> We know which sections of the HBase table we want and they are not
> contiguous.
>
> We have tried unioning the PTables but that turns out to be incredibly
> slow.
> Currently we are using a filter that results in many unnecessary reads.
>
> How do others solve this?
>
> I'm temped to write a TableSource that can do this.
>
> thanks
>
--
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>
|