Like, would Crunch support 0.94.5? I'm not really sure: our HBase dependencies are pretty minimal, which makes me think that creating a MultiTableInputFormat Source would be easy to write, but HBase has a tendency to change out from underneath us in ways that I have a hard time diagnosing w/o help from folks who know it better than I do. On Mon, Apr 8, 2013 at 1:52 PM, Micah Whitacre wrote: > What's the minimum supported version of HBase Crunch will support? We > have the exact same need but because the fix for HBASE-3996 and its > requirement for region server changes it wasn't as each to patch back to > 0.92 or 0.94.2 (CDH 4.2). > > > > On Mon, Apr 8, 2013 at 3:47 PM, Josh Wills wrote: > >> Maybe we need something based on this? >> >> https://issues.apache.org/jira/browse/HBASE-3996 >> >> >> On Mon, Apr 8, 2013 at 1:41 PM, Chad Urso McDaniel wrote: >> >>> This may be a core hadoop question. >>> >>> We are using Crunch with HBase. >>> We typically set up the input PTable like so: >>> --- >>> Scan scan = ... >>> HBaseSourceTarget source = new HBaseSourceTarget(tableName, scan); >>> PTable data = >>> pipeline.read(source); >>> --- >>> >>> A use case that we want to use in order to speed up the processing with >>> Crunch is using multiple Scans into one PTable. >>> >>> We know which sections of the HBase table we want and they are not >>> contiguous. >>> >>> We have tried unioning the PTables but that turns out to be incredibly >>> slow. >>> Currently we are using a filter that results in many unnecessary reads. >>> >>> How do others solve this? >>> >>> I'm temped to write a TableSource that can do this. >>> >>> thanks >>> >> >> >> >> -- >> Director of Data Science >> Cloudera >> Twitter: @josh_wills >> > > -- Director of Data Science Cloudera Twitter: @josh_wills