From dev-return-7244-apmail-gora-dev-archive=gora.apache.org@gora.apache.org Sun Feb 1 23:52:34 2015 Return-Path: X-Original-To: apmail-gora-dev-archive@www.apache.org Delivered-To: apmail-gora-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 48CAF10EBD for ; Sun, 1 Feb 2015 23:52:34 +0000 (UTC) Received: (qmail 44799 invoked by uid 500); 1 Feb 2015 23:52:35 -0000 Delivered-To: apmail-gora-dev-archive@gora.apache.org Received: (qmail 44757 invoked by uid 500); 1 Feb 2015 23:52:35 -0000 Mailing-List: contact dev-help@gora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@gora.apache.org Delivered-To: mailing list dev@gora.apache.org Received: (qmail 44746 invoked by uid 99); 1 Feb 2015 23:52:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Feb 2015 23:52:35 +0000 Date: Sun, 1 Feb 2015 23:52:35 +0000 (UTC) From: "Alfonso Nishikawa (JIRA)" To: dev@gora.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (GORA-401) Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/GORA-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14300792#comment-14300792 ] Alfonso Nishikawa commented on GORA-401: ---------------------------------------- Hi, [~lewismc]. Thank you very much for telling about the logging and junit. I did fast and I made that mistakes :) I uploaded GORA-401v4.patch with the changes. It seems to pass all tests. About adding to 0.6 release, I think the issue is enough important to release the fix in 0.6. In other hand, the fix is really ugly :P, specifically in the detail of being conveyed come classes of avro (osgi configuration is exporting all the package). So, I vote to release it, but if anyone thinks that the solution should be more elegant and in an engineering way (avro parser, etc), then I am ok to put it on hold until 0.7. [~drazzib],[~renato2099],[~alparslan.avci]: any thoughts? Thanks! > Serialization and deserialization of Persistent does not hold the entity dirty state from Map to Reduce > ------------------------------------------------------------------------------------------------------- > > Key: GORA-401 > URL: https://issues.apache.org/jira/browse/GORA-401 > Project: Apache Gora > Issue Type: Bug > Components: gora-core > Affects Versions: 0.4, 0.5 > Environment: Tested on gora-0.4, but seems logically to hold on gora-0.5. HBase backend. > Reporter: Alfonso Nishikawa > Priority: Critical > Labels: serialization > Fix For: 0.6 > > Attachments: GORA-401-tests.patch, GORA-401v1.patch, GORA-401v2.patch, GORA-401v3.patch, GORA-401v4.patch > > Original Estimate: 35h > Time Spent: 21h > Remaining Estimate: 14h > > After removing __g__dirty field in GORA-326, dirty field is not serialized. In GORA-321 {{[PersistentSerializer|https://github.com/apache/gora/blob/master/gora-core/src/main/java/org/apache/gora/mapreduce/PersistentSerializer.java]}} went from using {{[PersistentDatumWriter|https://github.com/apache/gora/blob/apache-gora-0.3/gora-core/src/main/java/org/apache/gora/avro/PersistentDatumWriter.java](/Reader)}} to Avro's {{SpecificDatumWriter}}, delegating the serialization of the dirty field to Avro (but really not desirable to have that field as a main field in the entities). > The proposal is to reintroduce the {{PersistentDatumWriter/Reader}} which will serialize the internal fields of the entities. > This bug affects, for example, Nutch, which loads only some fields in it's phases, serializes entities (from Map to Reduce), and when deserializes finds all fields as "dirty", independently of what fields were modified in the Map, and overwrite all data in datastore (deleting much things: downloaded content, parsed content, etc). > This effect can be seen in {{TestPersistentSerialization#testSerderEmployeeTwoFields}}, when debuging in {{TestIOUtils#testSerializeDeserialize}}. Proper breakpoints an inspections shows that, entities are "equal" when it's fields are equal. This is fine as "equal" definition, but another test must be added to check that serialization an deserialization keeps the dirty state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)