hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Solvannan R M <solvanna...@zoho.com.INVALID>
Subject Re: HBase Scan consumes high cpu
Date Wed, 18 Sep 2019 16:38:10 GMT
Hi Ram,

Thanks for your support! We will explore alternative schema designs.


Regards,

Solvannan R M


On 2019/09/17 05:21:39, ramkrishna vasudevan wrote:
 > Hi Solvannan>
 >
 > Currently there is no easy way to over come this case because deletes 
and>
 > its tracking takes precedence before the filter is even applied.>
 >
 > I get your case where you really don't know the columns which could 
have>
 > been previously deleted and hence you specify the entire range of>
 > columns in the filter. When this Put/Delete combination keeps 
increasing>
 > then you end up in these issues.>
 >
 > Am not aware of the use case here, but is there any better way to 
handle>
 > your schema for these cases?>
 >
 > Regards>
 > Ram>
 >
 >
 >
 >
 >
 >
 >
 >
 >
 >
 > On Mon, Sep 16, 2019 at 10:54 PM Solvannan R M >
 > wrote:>
 >
 > > Hi Ramkrishna,>
 > >>
 > > Thank you for your inputs! Unfortunately we would not be knowing the>
 > > column names beforehand. We had generated the above scenario for>
 > > illustration purposes.>
 > >>
 > > The intent of our query is that, given a single row key, a start 
column>
 > > key and an end column key, scan for the columns that are between 
the two>
 > > column keys. We have been achieving that by using ColumnRangeFilter.>
 > > Our write pattern would be Put followed by Delete immediately>
 > > (Keep_deleted_cells is set to false). So as more Deletes start to>
 > > accumulate, we notice the scan time starts to be very long and the 
cpu>
 > > shoots up to 100% for a core during every scan. On trying to debug we>
 > > observed the following behavior:>
 > >>
 > > At any instant, the cells of the particular row would be roughly>
 > > organized like>
 > >>
 > > D1 P1 D2 P2 D3 P3 ............ Dn-1 Pn-1 Dn Pn Pn+1 Pn+2 Pn+3 
Pn+4....>
 > >>
 > > where D and P are Delete and it's corresponding Put. The newer values>
 > > from Pn haven't been deleted yet.>
 > >>
 > > As the scan initiates, inside the StoreScanner,>
 > > NormalUserScanQueryMatcher would match the first cell (D1). It 
would be>
 > > added to the DeleteTracker and a MatchCode of SKIP is returned. Now 
for>
 > > the next cell (P1) the matcher would check with the DeleteTracker and>
 > > return a code of SEEK_NEXT_COL. Again the next cell would be D2 and 
this>
 > > would happen alternately. No filter is applied. This goes on till it>
 > > encounters Pn where filter is applied, SEEK_NEXT_USING_HINT is done 
and>
 > > now reseek happens to position near the desired range. The result is>
 > > returned quickly after that.>
 > >>
 > > The SKIP iterations happen a lot because our pattern would have very>
 > > less active cells and only towards the latest column 
qualifiers(ordered>
 > > high lexicographically). We were wondering if the query could be>
 > > modified so that the filter could be applied initially or some 
other way>
 > > to seek to the desired range directly.>
 > >>
 > > Regards,>
 > > Solvannan R M>
 > >>
 > >>
 > > On 2019/09/13 15:53:51, ramkrishna vasudevan wrote:>
 > > > Hi>>
 > > > Generally if you can form the column names like you did in the 
above>
 > > case>>
 > > > it is always better you add them using>>
 > > > scan#addColumn(family, qual). I am not sure of the shell syntax 
to add>>
 > > > multiple columns but am sure there is a provision to do it.>>
 > > >>
 > > > This will ensure that the scan starts from the given column and>
 > > fetches the>>
 > > > required column only. In your case probably you need to pass a 
set of>>
 > > > qualifiers (instead of just 1).>>
 > > >>
 > > > Regards>>
 > > > Ram>>
 > > >>
 > > > On Fri, Sep 13, 2019 at 8:45 PM Solvannan R M >>
 > > > wrote:>>
 > > >>
 > > > > Hi Anoop,>>
 > > > >>>
 > > > > We have executed the query with the qualifier set like you 
advised.>>
 > > > > But we dont get the results for the range but only the specified>>
 > > > > qualifier cell is returned.>>
 > > > >>>
 > > > > Query & Result:>>
 > > > >>>
 > > > > hbase(main):008:0> get 'mytable', 'MY_ROW',>>
 > > > > {COLUMN=>["pcf:\x00\x16\xDFx"],>>
 > > > > 
FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>>
 > > > > true, Bytes.toBytes(1499010.to_java(:int)), false)}>>
 > > > > COLUMN CELL>>
 > > > > pcf:\x00\x16\xDFx timestamp=1568380663616,>>
 > > > > value=\x00\x16\xDFx>>
 > > > > 1 row(s) in 0.0080 seconds>>
 > > > >>>
 > > > > hbase(main):009:0>>>
 > > > >>>
 > > > >>>
 > > > > Is there any other way to get arond this ?.>>
 > > > >>>
 > > > >>>
 > > > > Regards,>>
 > > > >>>
 > > > > Solvannan R M>>
 > > > >>>
 > > > >>>
 > > > > On 2019/09/13 04:53:45, Anoop John wrote:>>
 > > > > > Hi>>>
 > > > > > When you did a put with a lower qualifier int (put 'mytable',>>>
 > > > > > 'MY_ROW', "pcf:\x0A", "\x00") the system flow is getting a 
valid>
 > > cell>>
 > > > > at>>>
 > > > > > 1st step itself and that getting passed to the Filter. The 
Filter>
 > > is>>
 > > > > doing>>>
 > > > > > a seek which just avoids all the in between deletes and puts>>
 > > > > processing..>>>
 > > > > > In 1st case the Filter wont get into action at all unless the>
 > > scan flow>>>
 > > > > > sees a valid cell. The delete processing happens as 1st step>
 > > before the>>>
 > > > > > filter processinf step happening.>>>
 > > > > >>>
 > > > > > In this case I am wondering why you can not add the specific 
1st>>
 > > > > qualifier>>>
 > > > > > in the get part itself along with the column range filter. I 
mean>>>
 > > > > >>>
 > > > > > get 'mytable', 'MY_ROW', {COLUMN=>['pcf: *1499000 * '],>>>
 > > > > >>
 > > FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>>>
 > > > > > true, Bytes.toBytes(1499010.to_java(:int)), false)}>>>
 > > > > >>>
 > > > > > Pardon the syntax it might not be proper for the shell.. Can 
this>
 > > be>>
 > > > > done?>>>
 > > > > > This will make the scan to make a seek to the given qualifier 
at>
 > > 1st>>
 > > > > step>>>
 > > > > > itself.>>>
 > > > > >>>
 > > > > > Anoop>>>
 > > > > >>>
 > > > > > On Thu, Sep 12, 2019 at 10:18 PM Udai Bhan Kashyap (BLOOMBERG/>>
 > > > > PRINCETON) <>>>
 > > > > > ukashyap1@bloomberg.net> wrote:>>>
 > > > > >>>
 > > > > > > Are you keeping the deleted cells? Check 'VERSIONS' for the>
 > > column>>
 > > > > family>>>
 > > > > > > and set it to 1 if you don't want to keep the deleted 
cells.>>>
 > > > > > >>>>
 > > > > > > From: user@hbase.apache.org At: 09/12/19 12:40:01To:>>>
 > > > > > > user@hbase.apache.org>>>
 > > > > > > Subject: Re: HBase Scan consumes high cpu>>>
 > > > > > >>>>
 > > > > > > Hi,>>>
 > > > > > >>>>
 > > > > > > As said earlier, we have populated the rowkey "MY_ROW" with>
 > > integers>>>
 > > > > > > from 0 to 1500000 as column qualifiers. Then we have 
deleted the>>>
 > > > > > > qualifiers from 0 to 1499000.>>>
 > > > > > >>>>
 > > > > > > We executed the following query. It took 15.3750 seconds to>
 > > execute.>>>
 > > > > > >>>>
 > > > > > > hbase(main):057:0> get 'mytable', 'MY_ROW', 
{COLUMN=>['pcf'],>>>
 > > > > > >>
 > > FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>>>
 > > > > > > true, Bytes.toBytes(1499010.to_java(:int)), false)}>>>
 > > > > > > COLUMN CELL>>>
 > > > > > > pcf:\x00\x16\xDFx timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDFx>>>
 > > > > > > pcf:\x00\x16\xDFy timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDFy>>>
 > > > > > > pcf:\x00\x16\xDFz timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDFz>>>
 > > > > > > pcf:\x00\x16\xDF{ timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF{>>>
 > > > > > > pcf:\x00\x16\xDF| timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF|>>>
 > > > > > > pcf:\x00\x16\xDF} timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF}>>>
 > > > > > > pcf:\x00\x16\xDF~ timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF~>>>
 > > > > > > pcf:\x00\x16\xDF\x7F timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF\x7F>>>
 > > > > > > pcf:\x00\x16\xDF\x80 timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF\x80>>>
 > > > > > > pcf:\x00\x16\xDF\x81 timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF\x81>>>
 > > > > > > 1 row(s) in 15.3750 seconds>>>
 > > > > > >>>>
 > > > > > >>>>
 > > > > > > Now we inserted a new column with qualifier 10 (\x0A), such>
 > > that it>>>
 > > > > > > comes earlier in lexicographical order. Now we executed the

same>>
 > > > > query.>>>
 > > > > > > It only took 0.0240 seconds.>>>
 > > > > > >>>>
 > > > > > > hbase(main):058:0> put 'mytable', 'MY_ROW', "pcf:\x0A",

"\x00">>>
 > > > > > > 0 row(s) in 0.0150 seconds>>>
 > > > > > > hbase(main):059:0> get 'mytable', 'MY_ROW', 
{COLUMN=>['pcf'],>>>
 > > > > > >>
 > > FILTER=>ColumnRangeFilter.new(Bytes.toBytes(1499000.to_java(:int)),>>>
 > > > > > > true, Bytes.toBytes(1499010.to_java(:int)), false)}>>>
 > > > > > > COLUMN CELL>>>
 > > > > > > pcf:\x00\x16\xDFx timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDFx>>>
 > > > > > > pcf:\x00\x16\xDFy timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDFy>>>
 > > > > > > pcf:\x00\x16\xDFz timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDFz>>>
 > > > > > > pcf:\x00\x16\xDF{ timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF{>>>
 > > > > > > pcf:\x00\x16\xDF| timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF|>>>
 > > > > > > pcf:\x00\x16\xDF} timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF}>>>
 > > > > > > pcf:\x00\x16\xDF~ timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF~>>>
 > > > > > > pcf:\x00\x16\xDF\x7F timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF\x7F>>>
 > > > > > > pcf:\x00\x16\xDF\x80 timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF\x80>>>
 > > > > > > pcf:\x00\x16\xDF\x81 timestamp=1568123881899,>>>
 > > > > > > value=\x00\x16\xDF\x81>>>
 > > > > > > 1 row(s) in 0.0240 seconds>>>
 > > > > > > hbase(main):060:0>>>>
 > > > > > >>>>
 > > > > > >>>>
 > > > > > > We were able to reproduce the result consistently same, the>
 > > pattern>>>
 > > > > > > being bulk insert followed by bulk delete of most of the 
earlier>>
 > > > > columns.>>>
 > > > > > >>>>
 > > > > > >>>>
 > > > > > > We observed the following behaviour while debugging the>
 > > StoreScanner>>>
 > > > > > > (regionserver).>>>
 > > > > > >>>>
 > > > > > > Case 1:>>>
 > > > > > >>>>
 > > > > > > 1. When StoreScanner.next() is called, it starts to iterate>
 > > over the>>>
 > > > > > > cells from the start of the rowkey.>>>
 > > > > > >>>>
 > > > > > > 2. As all the cells are deleted (from 0 to 1499000), we could>
 > > see>>>
 > > > > > > alternate delete and put type cells. Now, the>>>
 > > > > > > NormalUserScanQueryMatcher.match() returns>>>
 > > > > > > ScanQueryMatcher.MatchCode.SKIP and>>>
 > > > > > > ScanQueryMatcher.MatchCode.SEEK_NEXT_COL for Delete and Put>
 > > type cell>>>
 > > > > > > respectively. This iteration happens throughout the range 
of 0 to>>
 > > > > 1499000.>>>
 > > > > > >>>>
 > > > > > > 3. This happens until a valid Put type cell is encountered,>
 > > where the>>>
 > > > > > > matcher applies the ColumnRangeFilter to the cell, which in

turm>>
 > > > > returns>>>
 > > > > > > ScanQueryMatcher.MatchCode.SEEK_NEXT_USING_HINT. In the next>>
 > > > > iteration>>>
 > > > > > > it seeks directly to the desired column.>>>
 > > > > > >>>>
 > > > > > >>>>
 > > > > > > Case 2:>>>
 > > > > > >>>>
 > > > > > > 1. When StoreScanner.next() is called, it starts to iterate>
 > > over the>>>
 > > > > > > cells from the start of the rowkey.>>>
 > > > > > >>>>
 > > > > > > 2. When the Put cell of qualifier 10 (\x0A) is encountered,

the>>
 > > > > matcher>>>
 > > > > > > returns ScanQueryMatcher.MatchCode.SEEK_NEXT_USING_HINT. In

the>
 > > next>>>
 > > > > > > iteration it seeks directly to the desired column.>>>
 > > > > > >>>>
 > > > > > >>>>
 > > > > > > Please let us know if this behaviour is intentional or it 
could>
 > > be>>
 > > > > avoided.>>>
 > > > > > >>>>
 > > > > > > Regards,>>>
 > > > > > >>>>
 > > > > > > Solvannan R M>>>
 > > > > > >>>>
 > > > > > >>>>
 > > > > > > On 2019/09/10 17:12:36, Josh Elser wrote:>>>
 > > > > > > > Deletes are held in memory. They represent data you have

to>>
 > > > > traverse >>>>
 > > > > > > > until that data is flushed out to disk. When you write
a 
new>
 > > cell>>>
 > > > > > > with a >>>>
 > > > > > > > qualifier of 10, that sorts, lexicographically, "early"

with>>
 > > > > respect>>>
 > > > > > > to >>>>
 > > > > > > > the other qualifiers you've written.>>>>
 > > > > > > >>>>
 > > > > > > > By that measure, if you are only scanning for the first>
 > > column in>>
 > > > > this >>>>
 > > > > > > > row which you've loaded with deletes, it would make total

sense>>
 > > > > to me >>>>
 > > > > > > > that the first case is slow and the second fast is fast>>>>
 > > > > > > >>>>
 > > > > > > > Can you please share exactly how you execute your "query"

for>>>
 > > > > > > both(all) >>>>
 > > > > > > > scenarios?>>>>
 > > > > > > >>>>
 > > > > > > > On 9/10/19 11:35 AM, Solvannan R M wrote:>>>>
 > > > > > > > > Hi,>>>>
 > > > > > > > > >>>>
 > > > > > > > > We have been using HBase (1.4.9) for a case where>
 > > timeseries data>>>
 > > > > > > is continuously inserted and deleted (high churn) against a>
 > > single>>>
 > > > > > > rowkey. The column keys would represent timestamp more or 
less.>>
 > > > > When we>>>
 > > > > > > scan this data using ColumnRangeFilter for a recent 
time-range,>>
 > > > > scanner>>>
 > > > > > > for the stores (memstore & storefiles) has to go through>
 > > contiguous>>>
 > > > > > > deletes, before it reaches the requested timerange data. 
While>
 > > using>>>
 > > > > > > this scan, we could notice 100% cpu usages in single core 
by the>>>
 > > > > > > regionserver process.>>>>
 > > > > > > > > >>>>
 > > > > > > > > So, for our case, most of the cells with older timestamps>
 > > will be>>>
 > > > > > > in deleted state. While traversing these deleted cells, the>>
 > > > > regionserver>>>
 > > > > > > process causing 100% cpu usage in single core.>>>>
 > > > > > > > > >>>>
 > > > > > > > > We tried to trace the code for scan and we observed
the>
 > > following>>>
 > > > > > > behaviour.>>>>
 > > > > > > > > >>>>
 > > > > > > > > 1. While scanner is initialized, it seeked all the>
 > > store-scanners>>>
 > > > > > > to the start of the rowkey.>>>>
 > > > > > > > > 2. Then it traverses the deleted cells and discards
it 
(as>
 > > it was>>>
 > > > > > > deleted) one by one.>>>>
 > > > > > > > > 3. When it encounters a valid cell (put type), it
applies>
 > > the>>>
 > > > > > > filter and it returns SEEK_TO_NEXT_USING_HINT.>>>>
 > > > > > > > > 4. Now the scanner seeks to the required key directly

and>>
 > > > > returning>>>
 > > > > > > the results quickly then.>>>>
 > > > > > > > > >>>>
 > > > > > > > > For confirming the mentioned behaviour, we have done
a>
 > > test:>>>>
 > > > > > > > > 1. We have populated a single rowkey with column

qualifier>
 > > as a>>>
 > > > > > > range of integers of 0 to 1500000 with random data.>>>>
 > > > > > > > > 2. We then deleted the column qualifier range of
0 to>
 > > 1499000.>>>>
 > > > > > > > > 3. Now the data is only in memsore. No store file

exists.>>>>
 > > > > > > > > 4. Now we scanned the rowkey with 
ColumnRangeFilter[1499000,>>>
 > > > > > > 1499010).>>>>
 > > > > > > > > 5. The query took 12 seconds to execute. During this

query,>
 > > a>>>
 > > > > > > single core is completely used>>>>
 > > > > > > > > 6. Then we put a new cell with qualifier 10.>>>>
 > > > > > > > > 7. Executed the same query, it took 0.018 seconds
to>
 > > execute.>>>>
 > > > > > > > > >>>>
 > > > > > > > > Kindly check this and advise !.>>>>
 > > > > > > > > >>>>
 > > > > > > > > Regards,>>>>
 > > > > > > > > Solvannan R M>>>>
 > > > > > > > > >>>>
 > > > > > > >>>>
 > > > > > >>>>
 > > > > > >>>>
 > > >
[message truncated...]

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message