hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Latham <lat...@davelink.net>
Subject Re: Different time ranges for different cfs when using TableInputFormat
Date Wed, 04 Mar 2015 14:05:52 GMT
That's not possible with HBase today.  The simplest thing may be to set
your Scan time range to include both today's and yesterday's data and then
filter down to only the data you want inside your map task.  Other
possibilities would be creating a custom filter to do the filtering on the
server side or even changing your input format or map task to run two
concurrent scans with different familes/time ranges and merging the results.

Being able to specify different time ranges for different column families
is something I'd like to do as well.  Perhaps we'll get that into HBase at
some point.


On Tue, Mar 3, 2015 at 5:23 PM, Felipe Sodré Silva <fsodre@gmail.com> wrote:

> When using TableInputFormat to make HBase data available to map/reduce
> jobs we can use the settings SCAN_TIMERANGE_START and
> SCAN_TIMERANGE_END to specify a time range during scan.
> Is it possible to somehow have different time ranges for different
> column families?
> This is my problem:
> I have table X with column families cf1, cf2 and cf3. I want to run a
> map/reduce job on it using the most recent versions of columns in cf1
> and cf2, but I want to use yesterday's data from cf3. Is this
> possible?
> Felipe

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message