hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Hbase inserts very slow
Date Fri, 18 Feb 2011 00:30:25 GMT
Good to know, and yeah cluster performance will definitely be
different. Optimizing on a pseudo-distributed setup only gets you so
far.

To answer your other question, use
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html

J-D

On Thu, Feb 17, 2011 at 4:26 PM, Vishal Kapoor
<vishal.kapoor.in@gmail.com> wrote:
> J-D,
>
> setCacheBlocks false improved the variance in the time it took for my
> explode, now its consistently reporting similar times.
>
> I have also tested inserting the master data to the other two families with
> separate map reduce jobs and I like the results so far.
> since I am still on pseudo distributed, it freaks my intel i5 but I my gut
> says anything processor intensive will work much better on the cluster, I
> can be wrong!
>
> I know I should ask this in a separate email thread so that others can
> benefit from it as well
> but can I add some sort of filter to scan so that I only see the row Ids, I
> don't care about the data in the table, to attach master data to this table
> I only care about the composite row key.
>
>
> thanks for all your help...
>
> Vishal
>
> On Thu, Feb 17, 2011 at 12:44 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> Bummer, well have you tried the other thing about setCacheBlocks? This
>> at least would get rid of the block cache churning and give us a
>> better picture of what's going on in the logs.
>>
>> J-D
>>
>> On Thu, Feb 17, 2011 at 9:20 AM, Vishal Kapoor
>> <vishal.kapoor.in@gmail.com> wrote:
>> > J-D,
>> > I do not see any significant improvement on combining the data into a
>> single
>> > family,
>> > maybe its because my data in the families are spread as 1:1:1, (single
>> > dimension for all CF )
>> > the next iteration I am planning is to write to these three families from
>> > three different map/reduce jobs.
>> > will keep all posted on my findings...
>> > Vishal
>> > On Wed, Feb 16, 2011 at 8:00 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >
>> > wrote:
>> >>
>> >> It's best to have different families for data of different nature and
>> >> when you usually don't read/write them together. For sure it shouldn't
>> >> slow you down as much as it does (because of HBASE-3149), but given
>> >> the current situation it's hard to recommend multiple families.
>> >>
>> >> J-D
>> >>
>> >> On Wed, Feb 16, 2011 at 4:32 PM, Vishal Kapoor
>> >> <vishal.kapoor.in@gmail.com> wrote:
>> >> > thanks J-D. for all your help, I will combine the three families and
>> >> > re-baseline the performance.
>> >> > but I was just wondering if I was using the family as they were
>> suppose
>> >> > to
>> >> > be used or not.
>> >> > the data in these three families are different, one of them is live
>> feed
>> >> > and
>> >> > the two other two are master (static kind) data and it made a lot of
>> >> > logical
>> >> > sense to separate them in different families.
>> >> > maybe if updating a family in a different map/reduce operation works
>> >> > fine
>> >> > then I will go that route.
>> >> > but the critical to quality factor here is speed for inserts and I
am
>> >> > going
>> >> > to definitely give a try to single family approach.
>> >> >
>> >> > Vishal
>> >> >
>> >> >
>> >> > On Wed, Feb 16, 2011 at 6:53 PM, Jean-Daniel Cryans
>> >> > <jdcryans@apache.org>wrote:
>> >> >
>> >> >> I don't understand... is having the same qualifier a hard
>> requirement?
>> >> >> Worst case you could have a prefix.
>> >> >>
>> >> >> J-D
>> >> >>
>> >> >> On Wed, Feb 16, 2011 at 3:29 PM, Vishal Kapoor
>> >> >> <vishal.kapoor.in@gmail.com> wrote:
>> >> >> > J-D,
>> >> >> > I also should mention that my data distribution in the three
>> families
>> >> >> > are
>> >> >> > 1:1:1
>> >> >> > I have three families so that I can have same qualifiers in
them.
>> and
>> >> >> also
>> >> >> > the data in those families are LIVE:MasterA:MasterB
>> >> >> >
>> >> >> > Vishal
>> >> >> >
>> >> >> > On Wed, Feb 16, 2011 at 6:22 PM, Jean-Daniel Cryans
>> >> >> > <jdcryans@apache.org
>> >> >> >wrote:
>> >> >> >
>> >> >> >> Very often there's no need for more than 1 family, I would
suggest
>> >> >> >> you
>> >> >> >> explore that possibility first.
>> >> >> >>
>> >> >> >> J-D
>> >> >> >>
>> >> >> >> On Wed, Feb 16, 2011 at 3:13 PM, Vishal Kapoor
>> >> >> >> <vishal.kapoor.in@gmail.com> wrote:
>> >> >> >> > does that mean I am only left with the choice of
writing to the
>> >> >> >> > three
>> >> >> >> > families in three different map jobs?
>> >> >> >> > or can I do it any other way?
>> >> >> >> > thanks,
>> >> >> >> > Vishal
>> >> >> >> >
>> >> >> >> > On Wed, Feb 16, 2011 at 12:56 PM, Jean-Daniel Cryans
<
>> >> >> >> jdcryans@apache.org>
>> >> >> >> > wrote:
>> >> >> >> >>
>> >> >> >> >> First, loading into 3 families is currently a
bad idea and is
>> >> >> >> >> bound
>> >> >> to
>> >> >> >> >> be inefficient, here's the reason why:
>> >> >> >> >> https://issues.apache.org/jira/browse/HBASE-3149
>> >> >> >> >>
>> >> >> >> >> Those log lines mean that your scanning of the
first table is
>> >> >> >> >> generating a log of block cache churn. When setting
up the Map,
>> >> >> >> >> set
>> >> >> >> >> your scanner to setCacheBlocks(false) before
passing it to
>> >> >> >> >> TableMapReduceUtil.initTableMapperJob
>> >> >> >> >>
>> >> >> >> >> Finally, you may want to give more memory to
the region server.
>> >> >> >> >>
>> >> >> >> >> J-D
>> >> >> >> >>
>> >> >> >> >> On Wed, Feb 16, 2011 at 7:35 AM, Vishal Kapoor
>> >> >> >> >> <vishal.kapoor.in@gmail.com> wrote:
>> >> >> >> >> > Lars,
>> >> >> >> >> >
>> >> >> >> >> > I am still working on pseudo distributed.
>> >> >> >> >> > hadoop-0.20.2+737/
>> >> >> >> >> > and hbase-0.90.0 with the hadoop jar from
the hadoop install.
>> >> >> >> >> >
>> >> >> >> >> > I have a LIVE_RAW_TABLE table, which gets
values from a live
>> >> >> >> >> > system
>> >> >> >> >> > I go through each row of that table and
get the row ids of
>> two
>> >> >> >> reference
>> >> >> >> >> > tables from it.
>> >> >> >> >> > TABLE_A and TABLE_B, then I explode this
to a new table
>> >> >> >> >> > LIVE_TABLE
>> >> >> >> >> > I use
>> >> >> >> >> > TableMapReduceUtil.initTableReducerJob("LIVE_TABLE",
null,
>> >> >> >> >> > job);
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> > LIVE_TABLE has three families, LIVE, A,
B and the row id is a
>> >> >> >> composite
>> >> >> >> >> > key
>> >> >> >> >> > reverseTimeStamp/rowidA/rowIdB
>> >> >> >> >> > after that a run a bunch of map reduce to
consolidate the
>> data,
>> >> >> >> >> > to start with I have around 15000 rows in
LIVE_RAW_TABLE.
>> >> >> >> >> >
>> >> >> >> >> > when I start with my job, i see it running
quite well till i
>> am
>> >> >> almost
>> >> >> >> >> > done
>> >> >> >> >> > with 5000 rows
>> >> >> >> >> > then it starts printing the message in the
logs, which I use
>> to
>> >> >> >> >> > not
>> >> >> >> see
>> >> >> >> >> > before.
>> >> >> >> >> > the job use to run for around 900 sec (
I have a lot of data
>> >> >> parsing
>> >> >> >> >> > while
>> >> >> >> >> > exploding )
>> >> >> >> >> > 15000 rows from LIVE_RAW_TABLE explodes
to around 500,000
>> rows
>> >> >> >> >> > in
>> >> >> >> >> > LIVE_TABLE.
>> >> >> >> >> >
>> >> >> >> >> > after those debug messages, the job runs
for around 2500 sec,
>> >> >> >> >> > I have not changed anything, including the
table design.
>> >> >> >> >> >
>> >> >> >> >> > here is my table description.
>> >> >> >> >> >
>> >> >> >> >> > {NAME => 'LIVE_TABLE', FAMILIES =>
[{NAME => 'LIVE',
>> >> >> >> >> > BLOOMFILTER =>
>> >> >> >> >> > 'NONE',
>> >> >> >> >> > REPLICATION_SCOPE => '0', VERSIONS =>
'1', COMPRESSION =>
>> >> >> >> >> > 'NONE',
>> >> >> TTL
>> >> >> >> =>
>> >> >> >> >> > '2147483647', BLOCKSIZE => '65536', IN_MEMORY
=> 'false',
>> >> >> BLOCKCACHE
>> >> >> >> =>
>> >> >> >> >> > 'true'}, {NAME => 'A', BLOOMFILTER =>
'NONE',
>> REPLICATION_SCOPE
>> >> >> >> >> > =>
>> >> >> >> '0',
>> >> >> >> >> > VERSIONS => '1', COMPRESSION => 'NONE',
TTL => '2147483647',
>> >> >> BLOCKSIZE
>> >> >> >> >> > =>
>> >> >> >> >> > '65536', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}, {NAME
>> =>
>> >> >> >> >> > 'B',
>> >> >> >> >> > BLOOMFILTER => 'NONE', REPLICATION_SCOPE
=> '0', VERSIONS =>
>> >> >> >> >> > '1',
>> >> >> >> >> > COMPRESSION => 'NONE', TTL => '2147483647',
BLOCKSIZE =>
>> >> >> >> >> > '65536',
>> >> >> >> >> > IN_MEMORY
>> >> >> >> >> > => 'false', BLOCKCACHE => 'true'}]}
>> >> >> >> >> >
>> >> >> >> >> > thanks for all your help.
>> >> >> >> >> >
>> >> >> >> >> > Vishal
>> >> >> >> >> >
>> >> >> >> >> > On Wed, Feb 16, 2011 at 4:26 AM, Lars George
<
>> >> >> lars.george@gmail.com>
>> >> >> >> >> > wrote:
>> >> >> >> >> >
>> >> >> >> >> >> Hi Vishal,
>> >> >> >> >> >>
>> >> >> >> >> >> These are DEBUG level messages and are
from the block cache,
>> >> >> >> >> >> there
>> >> >> is
>> >> >> >> >> >> nothing wrong with that. Can you explain
more what you do
>> and
>> >> >> >> >> >> see?
>> >> >> >> >> >>
>> >> >> >> >> >> Lars
>> >> >> >> >> >>
>> >> >> >> >> >> On Wed, Feb 16, 2011 at 4:24 AM, Vishal
Kapoor
>> >> >> >> >> >> <vishal.kapoor.in@gmail.com> wrote:
>> >> >> >> >> >> > all was working fine and suddenly
I see a lot of logs like
>> >> >> >> >> >> > below
>> >> >> >> >> >> >
>> >> >> >> >> >> > 2011-02-15 22:19:04,023 DEBUG
>> >> >> >> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Block
>> cache
>> >> >> >> >> >> > LRU
>> >> >> >> >> >> > eviction
>> >> >> >> >> >> > started; Attempting to free 19.88
MB of total=168.64 MB
>> >> >> >> >> >> > 2011-02-15 22:19:04,025 DEBUG
>> >> >> >> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Block
>> cache
>> >> >> >> >> >> > LRU
>> >> >> >> >> >> > eviction
>> >> >> >> >> >> > completed; freed=19.91 MB, total=148.73
MB, single=74.47
>> MB,
>> >> >> >> >> >> > multi=92.37
>> >> >> >> >> >> MB,
>> >> >> >> >> >> > memory=166.09 KB
>> >> >> >> >> >> > 2011-02-15 22:19:11,207 DEBUG
>> >> >> >> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Block
>> cache
>> >> >> >> >> >> > LRU
>> >> >> >> >> >> > eviction
>> >> >> >> >> >> > started; Attempting to free 19.88
MB of total=168.64 MB
>> >> >> >> >> >> > 2011-02-15 22:19:11,444 DEBUG
>> >> >> >> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Block
>> cache
>> >> >> >> >> >> > LRU
>> >> >> >> >> >> > eviction
>> >> >> >> >> >> > completed; freed=19.93 MB, total=149.09
MB, single=73.91
>> MB,
>> >> >> >> >> >> > multi=93.32
>> >> >> >> >> >> MB,
>> >> >> >> >> >> > memory=166.09 KB
>> >> >> >> >> >> > 2011-02-15 22:19:21,494 DEBUG
>> >> >> >> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Block
>> cache
>> >> >> >> >> >> > LRU
>> >> >> >> >> >> > eviction
>> >> >> >> >> >> > started; Attempting to free 19.87
MB of total=168.62 MB
>> >> >> >> >> >> > 2011-02-15 22:19:21,760 DEBUG
>> >> >> >> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Block
>> cache
>> >> >> >> >> >> > LRU
>> >> >> >> >> >> > eviction
>> >> >> >> >> >> > completed; freed=19.91 MB, total=148.84
MB, single=74.22
>> MB,
>> >> >> >> >> >> > multi=92.73
>> >> >> >> >> >> MB,
>> >> >> >> >> >> > memory=166.09 KB
>> >> >> >> >> >> > 2011-02-15 22:19:39,838 DEBUG
>> >> >> >> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Block
>> cache
>> >> >> >> >> >> > LRU
>> >> >> >> >> >> > eviction
>> >> >> >> >> >> > started; Attempting to free 19.87
MB of total=168.62 MB
>> >> >> >> >> >> > 2011-02-15 22:19:39,852 DEBUG
>> >> >> >> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Block
>> cache
>> >> >> >> >> >> > LRU
>> >> >> >> >> >> > eviction
>> >> >> >> >> >> > completed; freed=19.91 MB, total=148.71
MB, single=75.35
>> MB,
>> >> >> >> >> >> > multi=91.48
>> >> >> >> >> >> MB,
>> >> >> >> >> >> > memory=166.09 KB
>> >> >> >> >> >> > 2011-02-15 22:19:49,768 DEBUG
>> >> >> >> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Block
>> cache
>> >> >> >> >> >> > LRU
>> >> >> >> >> >> > eviction
>> >> >> >> >> >> > started; Attempting to free 19.87
MB of total=168.62 MB
>> >> >> >> >> >> > 2011-02-15 22:19:49,770 DEBUG
>> >> >> >> >> >> > org.apache.hadoop.hbase.io.hfile.LruBlockCache:
Block
>> cache
>> >> >> >> >> >> > LRU
>> >> >> >> >> >> > eviction
>> >> >> >> >> >> > completed; freed=19.91 MB, total=148.71
MB, single=76.48
>> MB,
>> >> >> >> >> >> > multi=90.35
>> >> >> >> >> >> MB,
>> >> >> >> >> >> > memory=166.09 KB
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > I haven't changed anything including
the table
>> definitions.
>> >> >> >> >> >> > please let me know where to look...
>> >> >> >> >> >> >
>> >> >> >> >> >> > thanks,
>> >> >> >> >> >> > Vishal Kapoor
>> >> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >
>> >
>>
>

Mime
View raw message