hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Best practice for writing to HFileOutputFormat(2) with multiple Column Families
Date Thu, 31 Jul 2014 03:01:13 GMT
I need to generate from a 2TB dataset and exploded it to 4 Column Families.

The result dataset is likely to be 20TB or more. I'm currently using Spark
so I sorted the (rk, cf, cq) myself. It's huge and I'm considering how to
optimize it.

My question is:
Should I sort and write each column family one by one, or should I put them
all together then do sort and write?

Does my question make sense?

Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message