hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kun yan <yankunhad...@gmail.com>
Subject Re: hdfs data into Hbase
Date Tue, 10 Sep 2013 01:27:27 GMT
Sorry my expression may not be very clear, that is the case, not importing
HDFS disk usage DFS Used: 54.19 GB Importing data from HDFS to HBase HDFS
 usage is DFS Used: 57.16 GB, in my HDFS storage data size is 69MB, HDFS
rep is 3


2013/9/9 Shahab Yunus <shahab.yunus@gmail.com>

> Some quick thoughts, well your size is bound to increase because recall
> that the rowkey is stored in every cell. So when in CSV if you have let us
> say 5 columns and when you imported them to HBASE using the first column as
> key, then you will end up with essentially 9 (1 for the rowkey and then 2
> each for rest of 4 'rowkey-column' pairs) columns (I know very crude and
> high-level estimation.)
>
> Also, how are you measuring the size in hdfs after import to HBase. Are you
> excluding any replication of data?
>
> Regards,
> Shahab
>
>
> On Mon, Sep 9, 2013 at 5:06 AM, kun yan <yankunhadoop@gmail.com> wrote:
>
> > Hello everyone, I wrote a mapreduce program to import data(HDFS) into
> > hbase, but when I import data into hbase later increased a lot, my
> original
> > data size is 69MB (HDFS), import HBase, My HDFS increase the size 3GB, I
> > wrote the program do what is wrong
> >
> > thanks
> >
> > public class MRImportHBaseCsv {
> >     public static void main(String[] args) throws IOException,
> >     InterruptedException, ClassNotFoundException {
> >
> > Configuration conf = new Configuration();
> > conf.set("fs.defaultFS", "hdfs://hydra0001:8020");
> > conf.set("yarn.resourcemanager.address", "hydra0001:8032");
> > Job job = createSubmitTableJob(conf, args);
> > job.submit();
> >
> >     }
> >     public static Job createSubmitTableJob(Configuration conf, String[]
> > args)
> >     throws IOException {
> > String tableName = args[0];
> > Path inputDir = new Path(args[1]);
> > Job job = new Job(conf, "HDFS_TO_HBase");
> > job.setJarByClass(HourlyImporter.class);
> > FileInputFormat.setInputPaths(job, inputDir);
> > job.setInputFormatClass(TextInputFormat.class);
> > job.setMapperClass(HourlyImporter.class);
> > // ++++ insert into table directly using TableOutputFormat ++++
> > TableMapReduceUtil.initTableReducerJob(tableName, null, job);
> > job.setNumReduceTasks(0);
> > TableMapReduceUtil.addDependencyJars(job);
> > return job;
> >     }
> >
> >     static class HourlyImporter extends
> >     Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
> >
> > private long ts;
> > // var column family
> > static byte[] family = Bytes.toBytes("s");
> > static String columns =
> "HBASE_ROW_KEY,STATION,YEAR,MONTH,DAY,HOUR,MINUTE";
> >
> > @Override
> > protected void cleanup(Context context) throws IOException,
> > InterruptedException {
> >     ts = System.currentTimeMillis();
> > }
> >
> > @Override
> > protected void map(LongWritable key, Text value, Context context)
> > throws IOException, InterruptedException {
> >
> >     ArrayList<String> columnsList = Lists.newArrayList(Splitter.on(',')
> >     .trimResults().split(columns));
> >
> >     String line = value.toString();
> >     ArrayList<String> columnValues = Lists.newArrayList(Splitter
> >     .on(',').trimResults().split(line));
> >     byte[] bRowKey = Bytes.toBytes(columnValues.get(0));
> >
> >     ImmutableBytesWritable rowKey = new ImmutableBytesWritable(bRowKey);
> >
> >
> >     Put p = new Put(Bytes.toBytes(columnValues.get(0)));
> >     for (int i = 1; i < columnValues.size(); i++) {
> > p.add(family, Bytes.toBytes(columnsList.get(i)),
> > Bytes.toBytes(columnValues.get(i)));
> >     }
> >     context.write(rowKey, p);
> > }
> >     }
> > }
> >
> >
> > --
> >
> > In the Hadoop world, I am just a novice, explore the entire Hadoop
> > ecosystem, I hope one day I can contribute their own code
> >
> > YanBit
> > yankunhadoop@gmail.com
> >
>



-- 

In the Hadoop world, I am just a novice, explore the entire Hadoop
ecosystem, I hope one day I can contribute their own code

YanBit
yankunhadoop@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message