gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alfonso Nishikawa <alfonso.nishik...@gmail.com>
Subject Re: Kudu datastore reports
Date Tue, 23 Jul 2019 17:36:03 GMT
Hi, John.

I checked out your code and it looks good :)
I found that you use javafx, but that is not present in OpenJDK and fails
to compile, and since we don't stick to Oracle JVM I would suggest to
change it.

Good job, keep it going :)

Regards,

Alfonso Nishikawa





El sáb., 20 jul. 2019 a las 22:25, John Mora (<jhnmora000@gmail.com>)
escribió:

> Hi.
>
> I updated my report in the Wiki[1]. Also, I pushed my last commits to my
> branch [2]. Please give it a look if you have time.
>
> This week, I will give a look to the map reduce tests for DataStores.
>
> Please let me know if you have suggestions.
>
> [1]
> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>
> Thanks,
> John
>
> El sáb., 13 jul. 2019 a las 19:31, John Mora (<jhnmora000@gmail.com>)
> escribió:
>
>> Hi all
>>
>> I updated my report in the Wiki[1]. Also, I pushed my last commits to my
>> branch [2]. Please give it a look if you have time.
>>
>> This week, I will be working in the getPartitions and deleteByQuery
>> methods and testing the other tests in the DataStoreTestBase class.
>>
>> Please let me know if you have suggestions.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>
>> Best,
>> John.
>>
>> El mié., 10 jul. 2019 a las 16:17, John Mora (<jhnmora000@gmail.com>)
>> escribió:
>>
>>> Hi Alfonso,
>>>
>>> Thanks so much for your time and support for this project. I will work
>>> on your comments. Responses inline :)
>>>
>>>
>>> El mar., 9 jul. 2019 a las 16:38, Alfonso Nishikawa (<
>>> alfonso.nishikawa@gmail.com>) escribió:
>>>
>>>> Hi, John.
>>>>
>>>> Sorry for the delay, I am changing work and I have been very busy :( I
>>>> will try to answer your questions :)
>>>>
>>>> *> In the Employee example there is a field called 'dateOfBirth'. I
>>>> tried to map that field with the UNIXTIME_MICROS datatype of Kudu (I
>>>> intuitively assumed this is a date.). However, in the java world the
>>>> Employee field is a Long value and the kudu datatype is a Timestamp. So,
I
>>>> was wondering whether I should force the usage of the UNIXTIME_MICROS
>>>> datatype for this field or just use a LONG datatype in Kudu.*
>>>>
>>>> In Avro 1.8 were introduced "Logical Types" so there is a "date" type
>>>> with an underlying "int" [1]. It's the first time I read about because
>>>> until the last version upgrade of Avro this weren't there. I would suggest
>>>> to ignore "dates" and map dateOfBirth as long, since in any case -in avro-
>>>> the value is the unix epoch. After this first approach, a design
>>>> improvement would be great, though :)
>>>>
>>>> - Would be good to have in the mapping a "timestamp" type so KuduStore
>>>> converts between the Entity long field <-> Kudu timestamp storage?
>>>> - Is there any other approach?
>>>>
>>>
>>> I think that Entity long field <-> Kudu timestamp conversion that the
>>> best alternative right now. Because, I would add more compatible datatypes
>>> to the mapping parameters which users can use. And this conversion should
>>> not be dificult to implement in my opinion. Also, the new Date datatype of
>>> avro could be implemented in newer versions because it would need further
>>> analysis in other datastores too. I will work on that.
>>>
>>>
>>>>
>>>>
>>>> *> What is the Gora's policy regarding flush()? *
>>>> *> KuduClient has multiple flushing modes
>>>> <https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html>and
>>>> also can set time interval
>>>> <https://kudu.apache.org/releases/1.2.0/apidocs/org/apache/kudu/client/KuduSession.html#setFlushInterval-int->
>>>> for automatic flush.*
>>>> *> Should theses behaviors be configurable using gora.properties file?
>>>> or just use the default configurations.*
>>>>
>>>> What we do in HBase is configure an autoflush option in gora.properties
>>>> [2] which is used when instanced the Table, but at the same time we
>>>> implement the flush() method to force the flush [3]. I would suggest to
>>>> follow that example, but adding the flushing options of Kudu. What flushing
>>>> mode (and time interval if it applies) do you suggest?
>>>>
>>>
>>> Well,  IMHO the default flush mode (auto flush sync) will do the job for
>>> most use cases. But I will add a configuration in gora.properties for
>>> selecting the other modes and specifying a autoflush time  if needed  by
>>> the user.
>>>
>>>
>>>>
>>>> *> Also, while reviewing the datastore interface I noticed this method
>>>> 'getPartitions(Query<K, T> query)'. What is the expected behavior of
this
>>>> method?, should I use the partition definition in the xml mapping file for
>>>> this?.*
>>>>
>>>> The method getPartitions(Query) is related to Hadoop. Apache Gora
>>>> integrates with Hadoop implementing a custom Map and Reduce that allows to
>>>> get/write Entities directly.
>>>> You can take a look at HBase's implementation [4], which relies o.a.h.hbase.mapreduce.TableInputFormatBase
>>>> [5] to compute the splits (start key---end key) with the location of the
>>>> split to create a colection of partitions [6].
>>>>
>>>> So, if Kudu is allowed to perform computation using local kudu splits,
>>>> then this method does the needed preparation to allow to "send the
>>>> computation to where the data is locally".
>>>>
>>>> In any case, you can see that:
>>>>
>>>>    - MongoDB store implementation does not implement splitting [7]
>>>>    - Cassandra store implementation does not implement splitting [8]
>>>>    - Aerospike store implementation does not implement splitting [9]
>>>>    - Accumulo store implementation* does* implement splitting [10]
>>>>
>>>> If Kudu has a method to get the different splits for a table and its
>>>> locations, then you will be able to implement the full feature.
>>>>
>>>> This is Hadoop related and it is not trivial. I haven't elaborated
>>>> much, so if you find you need more information let me know :)
>>>>
>>>>
>>>>
>>> I will check whether Kudu has these features in order to implement this
>>> method. If not I will use the default implementation found in other
>>> backends.
>>>
>>>
>>>> About Queries, what I can tell is that Hbase only implements "Start
>>>> key" + "End key" because it has only 2 operations: "get" and "scan", and
>>>> the querying is for "scan" operation, were you want an interval (or all)
of
>>>> the rows. Does Kudu have more querying functionality?
>>>>
>>>>
>>> Yes, Kudu implements a Scanner for querying data among with conditional
>>> predicates for filtering. I am using those classes.
>>>
>>>
>>>> About other topic, I am trying to install Kudu in standalone (all in 1
>>>> node). Do you use a Cloudera installation or do you have a standalone
>>>> installation? How do you do it? I found some instructions, but they talk
>>>> about compiling Kudu [11]. I was looking for something like HBase, that it
>>>> is unzip + execute "hbase start".
>>>>
>>>>
>>> I am using an embedded mini-cluster which comes with compiled binaries
>>> and can be used with maven[1] for testing my code. Once I get it mature
>>> enough I think I will be testing the datastore with a docker container [2].
>>> I could not find a unzip+execute bundle either and I am kinda noob for
>>> compiling it myself.
>>>
>>> [1]
>>> https://kudu.apache.org/docs/developing.html#_jvm_based_integration_testing
>>> [2] https://hub.docker.com/r/usuresearch/apache-kudu/
>>>
>>>
>>>> Good job and thank you!! :)
>>>>
>>>> Regards,
>>>>
>>>> Alfonso Nishikawa
>>>>
>>>>
>>>> [1] - https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types
>>>> [2] -
>>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L175
>>>> [3] -
>>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L458
>>>> [4] -
>>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L472
>>>> [5] -
>>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L479
>>>> [6] -
>>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L517
>>>> [7] -
>>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-mongodb/src/main/java/org/apache/gora/mongodb/store/MongoStore.java#L533
>>>> [8] -
>>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraStore.java#L292
>>>> [9] -
>>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-aerospike/src/main/java/org/apache/gora/aerospike/store/AerospikeStore.java#L369
>>>> [10] -
>>>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-accumulo/src/main/java/org/apache/gora/accumulo/store/AccumuloStore.java#L902
>>>> [11] - https://kudu.apache.org/docs/installation.html
>>>>
>>>>
>>>> El lun., 8 jul. 2019 a las 3:42, John Mora (<jhnmora000@gmail.com>)
>>>> escribió:
>>>>
>>>>> Hi all.
>>>>>
>>>>> As every week I updated my report in the Wiki[1]. Also, I pushed my
>>>>> last commits to my branch [2]. Please give it a look if you have time.
>>>>>
>>>>> This week, I will be continue working in the Queries implementation,
>>>>> please reach me out if you have any suggestions.
>>>>>
>>>>> Also, while reviewing the datastore interface I noticed this method
>>>>> 'getPartitions(Query<K, T> query)'. What is the expected behavior
of this
>>>>> method?, should I use the partition definition in the xml mapping file
for
>>>>> this?.
>>>>>
>>>>> Cheers,
>>>>> John.
>>>>>
>>>>> [1]
>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>>>>
>>>>>
>>>>> El dom., 30 jun. 2019 a las 16:56, John Mora (<jhnmora000@gmail.com>)
>>>>> escribió:
>>>>>
>>>>>> Hi all.
>>>>>>
>>>>>> I received my first evaluation from the Google Summer of Code program
>>>>>> with a positive result. Thanks so much for your support and confidence
to
>>>>>> the project and me.
>>>>>>
>>>>>> I updated my report of this week in the Wiki[1]. Also, I pushed my
>>>>>> last commits to my branch [2].
>>>>>>
>>>>>> This week, I will be reviewing my the serialization/ deserialization
>>>>>> process in order to identify optimizations specific for Kudu. Because
I
>>>>>> used a generic methods of other backends which probably could be
better
>>>>>> tuned for kudu. Also, I will start working on the Queries implementation.
>>>>>>
>>>>>> BTW, I added a question to the wiki about Date types. Please give
it
>>>>>> a look if you have time.
>>>>>>
>>>>>> [1]
>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>>>>>
>>>>>> Cheers,
>>>>>> John
>>>>>>
>>>>>> El jue., 27 jun. 2019 a las 21:02, John Mora (<jhnmora000@gmail.com>)
>>>>>> escribió:
>>>>>>
>>>>>>> Hi Carlos.
>>>>>>>
>>>>>>> Thanks for the reminder. I submitted the form yesterday. :D
>>>>>>>
>>>>>>> Best,
>>>>>>> John.
>>>>>>>
>>>>>>> El jue., 27 jun. 2019 a las 17:34, carlos muñoz (<
>>>>>>> carlosrmng@gmail.com>) escribió:
>>>>>>>
>>>>>>>> Hi John
>>>>>>>>
>>>>>>>> The first Google Summer of Code evaluation is due on June
28th.
>>>>>>>> Please make sure you submit your Mentors' evaluation on time.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Carlos
>>>>>>>>
>>>>>>>> El dom., 23 jun. 2019 a las 18:29, John Mora (<jhnmora000@gmail.com>)
>>>>>>>> escribió:
>>>>>>>>
>>>>>>>>> Hi all.
>>>>>>>>>
>>>>>>>>> FYI, I updated my report of this week on the Wiki[1].
Also, I
>>>>>>>>> pushed my last commits to my branch [2].
>>>>>>>>>
>>>>>>>>> As I mentioned in the reports I would like to know how
datastores
>>>>>>>>> deal with flush(), should it work always manually executed?.
>>>>>>>>>
>>>>>>>>> Finally, This week I will be implementing object
>>>>>>>>> serialization/deserialization in the methods put, get,
delete, exists. Do
>>>>>>>>> you have any suggestions on how to proceed with this
task?.
>>>>>>>>>
>>>>>>>>> Footnote: Thanks for the feedback Carlos, I fixed the
problem.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> El lun., 17 jun. 2019 a las 22:58, carlos muñoz (<
>>>>>>>>> carlosrmng@gmail.com>) escribió:
>>>>>>>>>
>>>>>>>>>> Hi John
>>>>>>>>>>
>>>>>>>>>> Your last changes look good to me. Keep it up. But,
I noticed
>>>>>>>>>> that you have created an Enumeration for datatypes,
which is very similar
>>>>>>>>>> to the kudu-client's [2]. Probably you should replace
[1] for [2] in order
>>>>>>>>>> to avoid code duplication.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/Column.java#L76
>>>>>>>>>> [2] https://kudu.apache.org/apidocs/org/apache/kudu/Type.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Carlos
>>>>>>>>>>
>>>>>>>>>> El sáb., 15 jun. 2019 a las 12:01, John Mora (<
>>>>>>>>>> jhnmora000@gmail.com>) escribió:
>>>>>>>>>>
>>>>>>>>>>> Hi all.
>>>>>>>>>>>
>>>>>>>>>>> I updated my report of this week on the Wiki[1].
I noticed that
>>>>>>>>>>> my code is lacking some javadoc documentation
I think I will be working on
>>>>>>>>>>> that this week, also I would like to enable and
check schema management
>>>>>>>>>>> tests (createSchema, existsSchema, etc.).
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> John.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> El mar., 11 jun. 2019 a las 0:11, John Mora (<
>>>>>>>>>>> jhnmora000@gmail.com>) escribió:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Alfonso.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks so much for your feedback. I am working
on your comments.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>> El lun., 10 jun. 2019 a las 16:11, Alfonso
Nishikawa (<
>>>>>>>>>>>> alfonso.nishikawa@gmail.com>) escribió:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi, John.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regarding your questions at the report
[1]:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - How to represent partitioning configurations
on the
>>>>>>>>>>>>>    mapping file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This was discussed in other emails, isn't
it? :)
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - KuduTestHarness requires the Maven
plugin
>>>>>>>>>>>>>    os-maven-plugin, which needs Maven
3.1.1+, is it a problem for Apache Gora?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I believe it is not a problem. My Ubuntu
comes with 3.6.0, far
>>>>>>>>>>>>> from 3.1.1, and I assume everyone uses
Maven 3 in a quite new version :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Alfonso Nishikawa
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> El lun., 10 jun. 2019 a las 21:07, Alfonso
Nishikawa (<
>>>>>>>>>>>>> alfonso.nishikawa@gmail.com>) escribió:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi, John.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>> Things I have seen:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> - The version of a maven dependency
[1] should go on the
>>>>>>>>>>>>>> Dependency Management of the root
pom [2]. Same for [3] and from there,
>>>>>>>>>>>>>> should not set the version there.
>>>>>>>>>>>>>> - Set test dependencies' scope to
test, at [4] and from there.
>>>>>>>>>>>>>> - Set the indentation to 2 spaces
for the pom [5]
>>>>>>>>>>>>>> - Missing "t" in "localhost" at [6].
>>>>>>>>>>>>>> - Port 13 for Kudu? That is "Daytime
Protocol" RFC 867 and
>>>>>>>>>>>>>> you will need root permission to
run it. The default port for kudu is 7051,
>>>>>>>>>>>>>> isn't it?
>>>>>>>>>>>>>> - I would ask you to add the same
functionality to load the
>>>>>>>>>>>>>> mapping from configuration as in
HBase's store [7] in you KuduStore [8].
>>>>>>>>>>>>>> This will have implications on your
readMapping at [9], so take a look at
>>>>>>>>>>>>>> the one for HBase at [10]
>>>>>>>>>>>>>> - I know it is in other backends,
but avoid RuntimeExceptions
>>>>>>>>>>>>>> (at least in Java since we have the
checked ones) like in [11]. You can
>>>>>>>>>>>>>> wrap them in GoraException. An example
is [12]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And nothing more :)
>>>>>>>>>>>>>> Keep going, good job.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L98
>>>>>>>>>>>>>> [2] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/pom.xml#L890
>>>>>>>>>>>>>> [3] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L121
>>>>>>>>>>>>>> [4] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L180
>>>>>>>>>>>>>> [5] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml
>>>>>>>>>>>>>> [6] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/test/resources/gora.properties#L18
>>>>>>>>>>>>>> [7] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L92
>>>>>>>>>>>>>> [8] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L53
>>>>>>>>>>>>>> [9] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L81
>>>>>>>>>>>>>> [10] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L822
>>>>>>>>>>>>>> [11] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L141
>>>>>>>>>>>>>> [12] -
>>>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L268
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Alfonso Nishikawa
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> El sáb., 8 jun. 2019 a las 20:26,
John Mora (<
>>>>>>>>>>>>>> jhnmora000@gmail.com>) escribió:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have just updated my weekly
reports on Cwiki [1]. This
>>>>>>>>>>>>>>> next week I think I should be
focusing on the create schema operation and
>>>>>>>>>>>>>>> solving the issue of the partitioning
configurations in the mapping file.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please let me know if you have
suggestions, my last commits
>>>>>>>>>>>>>>> are available here [2]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports
>>>>>>>>>>>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message