One big reason is that there will be updates in the memory store that aren't
yet written to HFiles. You'll miss these.
On Fri, May 6, 2011 at 12:27 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:
> Is there an issue open or any particular reason that an MR job needs to
> access
> the HBase data directly from the region server? It seems possible to also
> provide functionality such that MR can execute over the HFile(s) stored in
> HDFS, thereby giving similar performance characteristics comparable to
> typical
> MR jobs that execute against files in HDFS.
>
> Jason
>
|