gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato Marroquín Mogrovejo <renatoj.marroq...@gmail.com>
Subject Re: Week 1 Report and Some Questions
Date Sun, 02 Jun 2019 18:01:08 GMT
Hi Sheriffo,

Some opinions about your questions, but others are more than welcome
to suggest other things as well.

Q1: Are we going to consider arbitrary field length, e.g. if we set
the fieldcount to 100 then we have to create the respective Avro and
mapping files? Currently,
I don't think this process is automated and may be tedious for large
field counts.
I think for the first code iteration, we should use whatever
fieldcount you have generated for. Ideally, we should be able to
invoke the Gora bean generator and generate as many fields as required
by the benchmark configuration.

Q2: Second: The second problem has to do with the first one, if we
allow arbitrary field counts, then there has to be a mechanism to call
each of the set or get methods during CRUD operations. So to avoid
this I used Java Reflection. See the sample code below.
We have some options to deal with having arbitrarily number of fields.
1) Use reflection as you have which might be ok for the first code
iteration, but if we want to have some decent performance against
using datastores natively (no Gora), we should go away from it.
2) Do Gora class generation (and also generate the method used to
insert data through Gora) in a step before the benchmark starts.
Something like this:
# passing config parameters to generate Gora Beans with number of
required fields
# this should output the generate class and the method that does the insertion
$ gora_compiler.sh --benchmark --fields_required 4
The output path containing the result of this should be then include
(or passed) as runtime dependency to the benchmark class.
3) Because Gora uses Avro, we can use complex data types, e.g.,
arrays, maps. So we could represent number of fields as number of
elements inside an array. I would think that this option gives us the
best performance.
I think  we should continue with option (1) until we have the entire
pipeline working, and we understand how every piece fits together with
each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we
should do (2) which is the most general and the one that reflects how
people usually use Gora, and then we test with (3). I think all of
these steps are totally doable in our time frame as we build upon
previous steps.
The other thing that we should decide is which backend to use as there
are backends that are more mature than others. I'd say to use the
HBase backend as it is the most stable one and the one with more
features, and if we feel brave we can try other backends (and fix them
if necessary!)


Renato M>

El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay
(<sneceesay77@gmail.com>) escribió:
> Dear Mentors,
> My week one report is available at
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> I have also included a detailed question of and I will need your guidance
> on that.
> Please let me know what your thoughts are.
> Thank you.
> **Sheriffo Ceesay**

View raw message