gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state
Date Wed, 07 Jan 2015 12:16:35 GMT

    [ https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267565#comment-14267565
] 

Lewis John McGibbney edited comment on GORA-401 at 1/7/15 12:16 PM:
--------------------------------------------------------------------

Hi Alfonso et al. I've read through this and would like to spend some time today actually
absorbing the points made here. It should be mentioned at this stage that the work undertaken
to GORA-94 is no less trivial now than it was then. Do we have regression issues? Yes. What
are they? Here are a few...
 * The GoraCompiler has changed entirely in terms of functionality
 * The GoraCompiler has changed entirely in terms of the way it is physically invoked. This
is something which we can actually remedy through some carefully crafted ports of functionality
from the old GoraCompiler to the new one. I intended to more thoroughly document some of these
in GORA-324 once I wake back up.
 * Alfonso has indicated how StateManager was effectively not only deprecated (as there was
no real way to do this without having a horribly convoluted codebase) but deleted entirely
from the code base. In all honesty I remember talking about this at length here within the
community and with [~ap.giannakidis] in Dublin at the NoSQL meetup when I gave a [presentation|http://prezi.com/b5_vabnmelmy/?utm_campaign=share&utm_medium=copy&rc=ex0share]
on what was proposed and ultimately changing.
 * there are more regressions I can think of here folks however lingering on them without
addressing them directly is a fruitless effort. I would rather work towards a solution.
I would suggest one thing right now which is that if [~alfonso.nishikawa], you wish to reintroduce
the PersistentDatumWriter/Reader then by all means please go ahead. There is absolutely nothing
stopping you. 

I also want to state that the logic, reasoning and justification (all possibly bundled into
one primary driving force) for a move towards upgrading Avro in Gora in the manner we did
was that Avro had changed SO much between 1.3.3 --> 1.7.X with so many improvements that
any issues were we having with regards to serialization were not really compatible/comparable
with what was being experienced within the Avro community. It is safe to say that when Gora
initially entered incubation at the ASF, Avro was in its infancy. The library has moved on
and I think we need to ensure that Gora does the same.
Finally, this is *exactly* is why I was (and still am) *extremely* keen to get moving with
GoraCI under the RackSpace hosting we have available. I am of the opinion that we need to
be putting the Gora serialization code under much more scrutiny. This way we can hopefully
reach consensus on what we need ti implement based on facts in addition to opinion (I apologize
if this sounds a bit paradoxical).

I'll make an effort to look into all of the issues you've raised [~alfonso.nishikawa], thank
you for voicing them.


was (Author: lewismc):
Hi Alfonso et al. I've read through this and would like to spend some time today acrtually
absorbing the points made here. It should be mentioned at this stage that the work undertaken
to GORA-94 is no less trivial now than it was then. Do we have regression issues? Yes. What
are they? Here are a few...
 * The GoraCompiler has changed entirely in terms of functionality
 * The GoraCompiler has changed entirely in terms of the way it is physically invoked. This
is something which we can actually remedy through some carefully crafted ports of functionality
from the old GoraCompiler to the new one. I intended to more thoroughly document some of these
in GORA-324 once I wake back up.
 * Alfonso has indicated how StateManager was effectively not only deprecated (as there was
no real way to do this without having a horribly convoluted codebase) but deleted entirely
from the code base. In all honesty I rember talking about this at length here within the communty
and with [~ap.giannakidis] in Dublin at the NoSQL meetup when I gave a [presentation|http://prezi.com/b5_vabnmelmy/?utm_campaign=share&utm_medium=copy&rc=ex0share]
on what was proposed and ultimately changing.
 * there are more here folks however lingering on them without addressing them directly is
a fruitless effort. I would rather work towards a solution.

I would suggest one thing right now which is that if [~alfonso.nishikawa], you wish to reintroduce
the PersistentDatumWriter/Reader then by all means please go ahead. There is absolutely nothing
stopping you. 
I also want to state that the logic, reasoning and justification (all possibly bundled into
one primary driving force) for a move towards upgrading Avro in Gora in the manner we did
was that Avro had changed SO much between 1.3.3 --> 1.7.X with so many improvements that
any issues were we having with regards to serialization were not really compatible/comparable
with what was being experienced within the Avro community. It is safe to say that when Gora
initially entered incubation at the ASF, Avro was in its infancy. The library has moved on
and I think we need to ensure that Gora does the same.
Finally, this is *exactly* is why I was (and still am) *extremely* keen to get moving with
GoraCI under the RackSpace hosting we have available. I am of the opinion that we need to
be putting the Gora serialization code under much more scrutiny. This way we can hopefully
reach consensus on what we need ti implement based on facts in addition to opinion (I apologize
if this sounds a bit paradoxical).

I'll make an effort to look into all of the issues you've raised [~alfonso.nishikawa], thank
you for voicing them.

> Serialization and deserialization of Persistent does not hold the entity dirty state
> ------------------------------------------------------------------------------------
>
>                 Key: GORA-401
>                 URL: https://issues.apache.org/jira/browse/GORA-401
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: gora-core
>    Affects Versions: 0.4, 0.5
>         Environment: Tested on gora-0.4, but seems logically to hold on gora-0.5
>            Reporter: Alfonso Nishikawa
>            Priority: Critical
>              Labels: serialization
>   Original Estimate: 35h
>  Remaining Estimate: 35h
>
> After removing __g__dirty field in GORA-326, dirty field is not serialized. In GORA-321
{{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}}
went from using {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}}
to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty field to Avro
(but really not desirable to have that field as a main field in the entities).
> The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which will serialize
the internal fields of the entities.
> This bug affects, for example, Nutch, which loads only some fields in it's phases, serializes
entities (from Map to Reduce), and when deserializes finds all fields as "dirty", independently
of what fields were modified in the Map, and overwrite all data in datastore (deleting much
things: downloaded content, parsed content, etc).
> This effect can be seen in {{TestPersistentSerialization#testSerderEmployeeTwoFields}},
when debuging in {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections
shows that, entities are "equal" when it's fields are equal. This is fine as "equal" definition,
but another test must be added to check that serialization an deserialization keeps the dirty
state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message