gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Mora <jhnmora...@gmail.com>
Subject Re: Add datastore for Elasticsearch. Outreachy Week 7 Report
Date Mon, 18 Jan 2021 21:53:15 GMT
Hi Maria,

Thanks for the update.

Some comments:

https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStore.java#L192

Please add the index mappings when you create the elasticsearch index.

https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-create-index.html#java-rest-high-create-index-request-mappings

You can use the Field mappings parsed from the XML file.

https://github.com/podorvanova/gora/blob/gora-664/gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMapping.java#L28

Regarding your question, Elasticsearch supports complex datatypes:

https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

You can use the RethinkDB datastore as an example and store recursively the
fields of the embedded objects.

https://github.com/apache/gora/blob/b45581a371d2d69c472c37793efa085436056c9b/gora-rethinkdb/src/main/java/org/apache/gora/rethinkdb/store/RethinkDBStore.java#L448

Give it a try first and let me know if you get stuck.

Alternatively, if the first option is not feasible, you can serialize the
embedded objects as byte array, example:

https://github.com/apache/gora/blob/master/gora-solr/src/main/java/org/apache/gora/solr/store/SolrStore.java#L735
https://www.elastic.co/guide/en/elasticsearch/reference/current/binary.html

Best regards,
John.

El sáb, 16 ene 2021 a las 8:02, Maria Podorvanova (<
podorvanova.maria@gmail.com>) escribió:

> Hi,
>
> Report #7
> Period: January 10 - January 16
> Activities:
> - Fixed authentication [1]:
>
>    1. Set up password to Elasticsearch container properly
>    2. Set default Elasticsearch container server’s username in
>    gora.properties
>    3. Added exceptions for missing arguments in authentication
>
> - Added a parameter for the XSD validation [2]:
>
>    1. Defined a parameter for the XSD validation
>    2. Added a test case for the parameter
>    3. Made ElasticsearchStore read mapping file from properties, not
>    configuration
>
> - Implemented some basic Input-Output operations for schema management [3]:
>
>    1. Implemented delete, get and put methods
>    2. Implemented newInstance and getUnionSchema utility methods
>    3. Implemented basic serialization/deserialization for primitive AVRO
>    types
>
>
> Here are links to the commits:
> [1]
> https://github.com/apache/gora/commit/679b6d8f0a27b7a7be99b6e8773327d482b9996b
> [2]
> https://github.com/apache/gora/commit/0f17849a383ef5f29e650eda22fb4d3022578f43
> [3]
> https://github.com/apache/gora/commit/474a3946ebfde25732fe16d6546aa479fc6509a0
>
> This week I have started work on serialization/deserialization. While
> testing get method I found that UNION case could be a combination of NULL,
> STRING or another RECORD for external table references (e.g. boss for
> Employee). Could you explain to me what I should do in this case? I see two
> possible cases here: 1) Do deserialize recursively if the field value is a
> RECORD 2) Make another request for STRING case, where I have only key for
> the external object.
>
> Regards,
> Maria
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message