hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Best way to write to multiple tables in one map-only job
Date Mon, 03 Oct 2011 17:20:19 GMT
Option a) and b) are the same since MultiTableOutputFormat internally
uses multiple HTables. See for yourself:

https://github.com/apache/hbase/blob/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java

Also you can set the write buffer but setting
hbase.client.write.buffer on the configuration that your pass in the
job setup.

Using HTablePool in a single threaded application doesn't offer more
than just storage for your HTables.

Hope that helps,

J-D

On Sat, Oct 1, 2011 at 4:05 AM, Christopher Dorner
<christopher.dorner@gmail.com> wrote:
> Hallo,
>
> i am building a RDF Store using HBase and experimenting with different index
> tables and Schema Designs.
>
> For the input, i have a File where each line is a RDF triple in N3 Format.
>
> I need to write to multiple Tables since i need to build several index
> tables. For the sake of reducing IO and not reading the file a few times i
> want to do that in one Map-Only Job. Later the file will contain a few
> million triples.
>
> I am experimenting in Pseudo-Distributed-Mode so far but will be able to run
> it on our cluster soon.
> Storing the data in the Tables does not need to be speed-optimized at any
> cost, but i just want to do it as simple and fast as possible.
>
>
> What is the best way to write to more than 1 table in one Map-Task?
>
> a)
> I can either use "MultiTableOutputFormat.class" and write in map() using:
> Put put = new Put(key);
> put.add(kv);
> context.write(tableName, put);
>
> Can i write to e.g. 6 Tables in this way by creating a new Put for each
> table?
>
> But how can i turn off autoFlush and set writeBufferSize in this case?
> Because i think autoflush is not that good in this case of putting lots of
> values.
>
>
> b)
> I can use an instance of HTable in the Mapper class. Then i can set
> autoFlush and writeBufferSize and write to the table using:
> HTable table = new HTable(config, tableName);
> table.put(put);
>
> But it is recommended to use only one instance of HTable, so i would need to
> do
> table = new Table();
> for each table i want to write to. Is that still fine with 6 tables?
> I stumbled upon HTablePool. Is this for these scenarios?
>
>
> Thank You and Regards,
> Christopher
>

Mime
View raw message