gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alfonso Nishikawa (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state
Date Fri, 21 Nov 2014 10:10:34 GMT
Alfonso Nishikawa created GORA-401:
--------------------------------------

             Summary: Serialization and deserialization of Persistent does not hold the entity
dirty state
                 Key: GORA-401
                 URL: https://issues.apache.org/jira/browse/GORA-401
             Project: Apache Gora
          Issue Type: Bug
          Components: gora-core
    Affects Versions: 0.5, 0.4
         Environment: Tested on gora-0.4, but seems logically to hold on gora-0.5
            Reporter: Alfonso Nishikawa
            Priority: Critical


After removing __g__dirty field in GORA-326, dirty field is not serialized. In GORA-321 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
went from using {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty field to Avro
(but really not desirable to have that field as a main field in the entities).

The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} wich will serialize the
internal fields of the entities.

This bug affects, for example, Nutch, which loads only some fields in it's phases, serializes
entities (from Map to Reduce), and when deserializes finds all fields as "dirty", independently
of what fields were modified in the Map, and overwrite all data in datastore (deleting much
things: downloaded content, parsed content, etc).

This effect can be seen in {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when
debuging in {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections shows
that, entities are "equal" when it's fields are equal. This is fine as "equal" definition,
but another test must be added to check that serialization an deserialization keeps the dirty
state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message