gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alfonso Nishikawa (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce
Date Sun, 18 Sep 2016 22:26:20 GMT

    [ https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15498598#comment-15498598
] 

Alfonso Nishikawa edited comment on GORA-401 at 9/18/16 10:25 PM:
------------------------------------------------------------------

I completely forgot this issue, and I don't quite remember what I did in the patch.

I believe the patch uploaded time ago is not appropriate. The main reason is that it has an
embedded class from Avro version XX.XX (I don't remember which).

I find _very_ interesting what [~djkevincr] commented, specially https://github.com/apache/gora/blob/master/gora-compiler/src/main/velocity/org/apache/gora/compiler/templates/record.vm#L357-L387
. We should research about seeing if that approach will get us to avoid the embedded class
from Avro (that was quick & dirty from me).

But at this moment I am very busy with my grade project (although growing at a good pace,
squeezes all my time), so I actually lack the needed time :( - or at least to take a commitment
to get it fast :\

Anyone can take over this research?


was (Author: alfonso.nishikawa):
I completely forgot this issue, and I don't quite remember what I did in the patch.

I believe the patch uploaded time ago is not appropriate. The main reason is that it has an
embedded class from Avro version XX.XX (I don't remember which).

I find _very_ interesting what [~djkevincr] commented, specially https://github.com/apache/gora/blob/master/gora-compiler/src/main/velocity/org/apache/gora/compiler/templates/record.vm#L357-L387
. We should research about seeing if that approach will get us to avoid the embedded class
from Avro (that was quick & dirt from me).

But at this moment I am very busy with my grade project (although growing at a good pace,
squeezes all my time), so I actually lack the needed time :( - or at least to take a commitment
to get it fast :\

Anyone can take over this research?

> Serialization and deserialization of Persistent does not hold the entity dirty state
from Map to Reduce
> -------------------------------------------------------------------------------------------------------
>
>                 Key: GORA-401
>                 URL: https://issues.apache.org/jira/browse/GORA-401
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: gora-core
>    Affects Versions: 0.4, 0.5
>         Environment: Tested on gora-0.4, but seems logically to hold on gora-0.5. HBase
backend.
>            Reporter: Alfonso Nishikawa
>            Assignee: Alfonso Nishikawa
>            Priority: Critical
>              Labels: serialization
>             Fix For: 0.8
>
>         Attachments: GORA-401-tests.patch, GORA-401v1.patch, GORA-401v2.patch, GORA-401v3.patch,
GORA-401v4.patch, GORA-401v5.patch
>
>   Original Estimate: 35h
>          Time Spent: 21h
>  Remaining Estimate: 14h
>
> After removing __g__dirty field in GORA-326, dirty field is not serialized. In GORA-321
{{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
went from using {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty field to Avro
(but really not desirable to have that field as a main field in the entities).
> The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which will serialize
the internal fields of the entities.
> This bug affects, for example, Nutch, which loads only some fields in it's phases, serializes
entities (from Map to Reduce), and when deserializes finds all fields as "dirty", independently
of what fields were modified in the Map, and overwrite all data in datastore (deleting much
things: downloaded content, parsed content, etc).
> This effect can be seen in {{TestPersistentSerialization#testSerderEmployeeTwoFields}},
when debuging in {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections
shows that, entities are "equal" when it's fields are equal. This is fine as "equal" definition,
but another test must be added to check that serialization an deserialization keeps the dirty
state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message