gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sheriffo Ceesay <sneceesa...@gmail.com>
Subject Re: Week 2 Report and A Question
Date Tue, 11 Jun 2019 14:12:31 GMT
Hello,

I have taken a proper look at the recommendations from @Alfonso and @Renato
and below are the outcomes.

Failed Attempts
1. Optimisation, for the insert operation, to avoid the concatenation
issue, I have just taken the quickest route by calling the methods directly
without reflection. Below are those calls. Note: I have moved all reusable
codes to the init method.

public int insert(String table, String key, HashMap<String, ByteIterator>
> values) {
>       user.setField0(values.get("field0").toString());
>       user.setField1(values.get("field1").toString());
>       user.setField2(values.get("field2").toString());
>       user.setField3(values.get("field3").toString());
>       user.setField4(values.get("field4").toString());
>       user.setField5(values.get("field5").toString());
>       user.setField6(values.get("field6").toString());
>       user.setField7(values.get("field7").toString());
>       user.setField8(values.get("field8").toString());
>       user.setField9(values.get("field9").toString());
>       dataStore.put(user.getUserId().toString(), user);
>     } catch (Exception e) {
>       return FAILED;
>     }
>     return SUCCESS;
>   }
>

if the above had worked, I would have changed the code as suggested by
Alfonso. Also, I may be wrong but the way I understand YCSB framework is,
it will execute an insert operation for each user object, so I thought it
was right to create a user object within the insert method.


2. I used different config values for *-Xmx (256MB, 512MB, 1GB, 2GB)* and
even disabled GC checking using *-XX:-UseGCOverheadLimit* but they all
failed with the same GC error.

Successful Attempt -- There may be room for improvement
Using the configurations below worked but I think it is not the best for
write performance.

First, I read from [1] related to [2] that the following oneliner code
should be executed for better HBase performance when using YCSB. It
basically avoids overloading a single region server.

hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of
regionservers)
hbase(main):002:0> create 'users', 'info', {SPLITS =>
(1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}

Second, as suggested by @Renato Marroquín Mogrovejo
<renatoj.marroquin@gmail.com> , it only works when I set

*hbase.client.autoflush.default=true*

However, from [3], I found "HBase autoflushing. Enabling autoflush
decreases write performance. Available since Gora 0.2. Defaults to
disabled.". So I am of the opinion that the problem is not entirely solved.

I have done the following testing to insert 1M records into MongoDB and
HBase, so I think this may not be bad after all but more benchmarks may be
required to validate this. HBase in Gora has almost the same performance as
vanilla YCSB to benchmark it.

*Backend          Ave Time Taken (sec)*
MongoDB                      ~90
HBase in Gora              ~160
HBase YCSB                ~160


[1] https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
[2] https://issues.apache.org/jira/browse/HBASE-4163
[3] https://gora.apache.org/current/gora-hbase.html

Comments are welcomed.

Thank you.

**Sheriffo Ceesay**


On Tue, Jun 11, 2019 at 12:04 AM Sheriffo Ceesay <sneceesay77@gmail.com>
wrote:

> Hello Alfonso and Renato,
>
> Thank you for getting in touch and thanks for the detailed replies.
>
> I will have proper look at this tomorrow morning. I did some
> troubleshooting yesterday (mostly playing with Xmx and zookeeper timeout
> settings), that improved the conditions, but it did not entirely solve the
> problem. Preliminary, it seems the problem has to do with configuration or
> how HBaseStore is implemented (this may not be entirely true).
>
> I will keep you all posted whenever I thoroughly have a look at your
> suggestions.
>
> Thanks again.
>
>
> **Sheriffo Ceesay**
>
>
> On Mon, Jun 10, 2019 at 11:14 PM Alfonso Nishikawa <
> alfonso.nishikawa@gmail.com> wrote:
>
>> Hi!
>>
>> My hypothesis is taht that the difference between MongoDB and HBase is
>> that
>> HBase put more stress serializing with avro. It could affect too that if
>> the HBase's test is performed after MongoDB's ones, then the GC starts
>> from
>> a "bad" situation.
>>
>> From [A] linked by @Renato, if the error was OutOfMemoryException I would
>> have recommended lowering gora.hbasestore.scanner.caching to 100, 10 or
>> even 1, but with a GC error I am not that much sure. In anycase,
>> @Sheriffo:
>> you can try this if with the optimizations still doesn't work :)
>>
>> @Renato: Thx for the links!
>>
>> Regards,
>>
>> Alfonso Nishikawa
>>
>>
>>
>> El lun., 10 jun. 2019 a las 22:02, Renato Marroquín Mogrovejo (<
>> renatoj.marroquin@gmail.com>) escribió:
>>
>> > @Alfonso,
>> > Thank you very much for the suggestions! you are totally right about
>> > all of your points! Sheriffo, please benefit from them ;)
>> >
>> > Also what is strange is this (although it can be optimized as Alfonso
>> > pointed out) is that it works for the MongoDB backend. So I would also
>> > suspect on the configuration of the Gora-HBase client. Have you taken
>> > a look at [A] for example? or other Gora-HBase assumed configurations
>> > [B]? Maybe there you can specify some Xmx / Xms config.
>> >
>> >
>> > Best,
>> >
>> > Renato M.
>> >
>> > [A]
>> >
>> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties
>> > [B]
>> >
>> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml
>> >
>> > El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa
>> > (<alfonso.nishikawa@gmail.com>) escribió:
>> > >
>> > > Hi again, Sheriffo.
>> > >
>> > > More improvements to [1] over the last email:
>> > >
>> > > - fields.toArray() doesn't need a full array like in [6]. You should
>> do
>> > > just fields.toArray(new String[0]), and better if you create an array
>> [0]
>> > > and reuse it. That call only needs the type.
>> > > - I guess the class at [2] will always be the same, so you don't need
>> to
>> > > set it on every insert call.
>> > > - The string concatenation is overkilling for the jvm on the 1M calls
>> * N
>> > > fields at [3] and same for [4]. Precalculate the names in a list or
>> array
>> > > and reuse then for the 1M*N calls.
>> > > - Other optimization for [3] is, given that PersistentBase [5]
>> exctends
>> > > SpecificRecordBase, you can access the fields by index with
>> > > SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object).
>> > >
>> > > [1] -
>> > >
>> >
>> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127
>> > > [2] -
>> > >
>> >
>> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134
>> > > [3] -
>> > >
>> >
>> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136
>> > > [4] -
>> > >
>> >
>> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139
>> > > [5] -
>> > >
>> >
>> https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3
>> > > [6] -
>> > >
>> >
>> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163
>> > >
>> > > Let's see if with that optimizations we free the jvm memory management
>> > from
>> > > much stress.
>> > >
>> > > Regards,
>> > >
>> > > Alfonso Nishikawa
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (<
>> > > alfonso.nishikawa@gmail.com>) escribió:
>> > >
>> > > > Hi, Sheriffo.
>> > > >
>> > > > You can try reusing the Persistent instances [1] to insert the
>> data. I
>> > > > don't know all the backends, but they should be reusable, at least
>> in
>> > > > mongoDB and HBase.
>> > > >
>> > > > [1] -
>> > > >
>> >
>> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130
>> > > >
>> > > > Regards,
>> > > >
>> > > > Alfonso Nishikawa
>> > > >
>> > > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (<
>> > > > alfonso.nishikawa@gmail.com>) escribió:
>> > > >
>> > > >> Hi, Sheriffo.
>> > > >>
>> > > >> I really don't know how to solve it, but are you setting any Xmx
/
>> Xms
>> > > >> configuration values?
>> > > >>
>> > > >> Regards,
>> > > >>
>> > > >> Alfonso NIshikawa
>> > > >>
>> > > >>
>> > > >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay (<
>> > sneceesay77@gmail.com>)
>> > > >> escribió:
>> > > >>
>> > > >>> Hi All,
>> > > >>>
>> > > >>> Week 2 progress update is available at
>> > > >>>
>> > > >>>
>> >
>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>> > > >>>
>> > > >>> I have one question that I would like my mentors to advise
on, I
>> am
>> > still
>> > > >>> working it but thought it would be good to report it because
it is
>> > HBase
>> > > >>> specific.
>> > > >>>
>> > > >>> So the problem has to do with an OutOfMemory error when inserting
>> 1M
>> > +
>> > > >>> record in HBase.  This happens when I try to run the actual
>> > benchmark by
>> > > >>> first loading HBase with 1 million plus records. It works
>> perfectly
>> > for
>> > > >>> MongoDB but not HBase
>> > > >>>
>> > > >>> So I am assuming this problem is specific to HBase.  The stack
>> trace
>> > is
>> > > >>> given below.
>> > > >>>
>> > > >>> Exception in thread "Thread-1" java.lang.OutOfMemoryError:
GC
>> > overhead
>> > > >>> limit exceeded
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at
>> > > >>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at java.lang.StringCoding.encode(StringCoding.java:344)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at java.lang.String.getBytes(String.java:918)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at
>> org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at
>> > > >>>
>> > > >>>
>> >
>> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at
>> > > >>>
>> > > >>>
>> >
>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at
>> > > >>>
>> > > >>>
>> >
>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at
>> > > >>> org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at
>> > org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at
>> > > >>>
>> > > >>>
>> >
>> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at
>> > > >>>
>> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>>         at com.yahoo.ycsb.ClientThread.run(Client.java:269)
>> > > >>>
>> > > >>> The insert implementation of the module available at
>> > > >>> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark
>> in
>> > > >>> GoraBenchmarkClient.java is very straight forward. I have
had a
>> brief
>> > > >>> look
>> > > >>> at HBaseStore.java put() implementation but could not find
an
>> issue
>> > with
>> > > >>> that.
>> > > >>>
>> > > >>> If I solve this problem, then I will do run more workloads
to
>> verify
>> > that
>> > > >>> the module is stable for the basic implementation. Then I
will go
>> > ahead
>> > > >>> and
>> > > >>> work on suggestions made by Renato last week.
>> > > >>>
>> > > >>> Please let me know what your thoughts are.
>> > > >>>
>> > > >>>
>> > > >>> Thank you.
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> **Sheriffo Ceesay**
>> > > >>>
>> > > >>
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message