gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alfonso Nishikawa <alfonso.nishik...@gmail.com>
Subject Re: Week 2 Report and A Question
Date Mon, 10 Jun 2019 21:39:02 GMT
Hi again, Sheriffo.

More improvements to [1] over the last email:

- fields.toArray() doesn't need a full array like in [6]. You should do
just fields.toArray(new String[0]), and better if you create an array [0]
and reuse it. That call only needs the type.
- I guess the class at [2] will always be the same, so you don't need to
set it on every insert call.
- The string concatenation is overkilling for the jvm on the 1M calls * N
fields at [3] and same for [4]. Precalculate the names in a list or array
and reuse then for the 1M*N calls.
- Other optimization for [3] is, given that PersistentBase [5] exctends
SpecificRecordBase, you can access the fields by index with
SpecificRecordBase.get(int) and SpecificRecordBase.put(int, Object).

[1] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127
[2] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134
[3] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136
[4] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139
[5] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3
[6] -
https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163

Let's see if with that optimizations we free the jvm memory management from
much stress.

Regards,

Alfonso Nishikawa










El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (<
alfonso.nishikawa@gmail.com>) escribió:

> Hi, Sheriffo.
>
> You can try reusing the Persistent instances [1] to insert the data. I
> don't know all the backends, but they should be reusable, at least in
> mongoDB and HBase.
>
> [1] -
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130
>
> Regards,
>
> Alfonso Nishikawa
>
> El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (<
> alfonso.nishikawa@gmail.com>) escribió:
>
>> Hi, Sheriffo.
>>
>> I really don't know how to solve it, but are you setting any Xmx / Xms
>> configuration values?
>>
>> Regards,
>>
>> Alfonso NIshikawa
>>
>>
>> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay (<sneceesay77@gmail.com>)
>> escribió:
>>
>>> Hi All,
>>>
>>> Week 2 progress update is available at
>>>
>>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>>>
>>> I have one question that I would like my mentors to advise on, I am still
>>> working it but thought it would be good to report it because it is HBase
>>> specific.
>>>
>>> So the problem has to do with an OutOfMemory error when inserting 1M +
>>> record in HBase.  This happens when I try to run the actual benchmark by
>>> first loading HBase with 1 million plus records. It works perfectly for
>>> MongoDB but not HBase
>>>
>>> So I am assuming this problem is specific to HBase.  The stack trace is
>>> given below.
>>>
>>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead
>>> limit exceeded
>>>
>>>
>>>
>>>         at
>>> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
>>>
>>>
>>>
>>>         at java.lang.StringCoding.encode(StringCoding.java:344)
>>>
>>>
>>>
>>>
>>>         at java.lang.String.getBytes(String.java:918)
>>>
>>>
>>>
>>>
>>>         at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)
>>>
>>>
>>>
>>>
>>>         at
>>>
>>> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)
>>>
>>>
>>>
>>>         at
>>>
>>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)
>>>
>>>
>>>
>>>         at
>>>
>>> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348)
>>>
>>>
>>>
>>>         at
>>> org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319)
>>>
>>>
>>>
>>>
>>>         at org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84)
>>>
>>>
>>>
>>>
>>>         at
>>>
>>> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141)
>>>
>>>
>>>
>>>         at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)
>>>
>>>
>>>
>>>
>>>         at
>>> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)
>>>
>>>
>>>
>>>         at com.yahoo.ycsb.ClientThread.run(Client.java:269)
>>>
>>> The insert implementation of the module available at
>>> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark  in
>>> GoraBenchmarkClient.java is very straight forward. I have had a brief
>>> look
>>> at HBaseStore.java put() implementation but could not find an issue with
>>> that.
>>>
>>> If I solve this problem, then I will do run more workloads to verify that
>>> the module is stable for the basic implementation. Then I will go ahead
>>> and
>>> work on suggestions made by Renato last week.
>>>
>>> Please let me know what your thoughts are.
>>>
>>>
>>> Thank you.
>>>
>>>
>>>
>>> **Sheriffo Ceesay**
>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message