hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marko Dinic <hacker.ma...@gmail.com>
Subject Write intermediate results of MR job to HBase or to HDFS
Date Fri, 11 Dec 2015 13:50:43 GMT

I have a sequence of MR jobs which produces some intermediate results -
output of one job is input to another one.

Also, some data is always used as input to MR jobs. That data is stored in

I would like to know which of the following is more performant:

1) Write intermediate results to HBase in one job and read from HBase in
the next job
2) Write intermediate results to HDFS in one job and read from HDFS in the
next job

Also, about the data which is always used in MR jobs:

1) Read same data each time from HBase (which includes scanning by rowkey)
2) Read data from HBase only first time, store it to HDFS and read from
HDFS every next time (avoid querying the database each time)

Please elaborate why would you choose one.

Best regards,
Marko Dinic

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message