hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Li" <ning.li...@gmail.com>
Subject Re: Multi get/put
Date Thu, 07 Aug 2008 14:47:02 GMT
> In hbase, on split, daughters hold a reference to either the top or bottom
> half of their parent region.  References are undone by compactions; as part
> of compaction, the part of the parent referenced by the daughter gets
> written out to store files under the daughter.  Daughters try to undo
> references as promptly as possible because regions with references are not
> splitable (references to references, and so on, would soon become
> unmanageble).
> In your description, you mentioned that daughter regions reference their
> parents' index.  When I said, 'a rewrite of the lucene index', I was asking,
> as per hbase regions, if you followed the model and wrote a new lucene index
> comprised of daughter-only content at compaction time.  Or do you just
> 'optimize' and let the references build up so the daughter of a daughter
> points all the ways up to the parent?

Similar as in HBase, a split is not allowed if there are references to
parent files, whether they are store files or index files.

> So, why do you think it so slow going via HDFS FileSystem when the data is
> local?  Is it the block-orientated access or is there just a high-tax going
> via the HDFS FS interface?

Because of how DFSClient.DFSInputStream is implemented, a socket
connection is opened and closed for almost every random read. We'll
experiment resuing socket connections in DFSInputStream.


View raw message