gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Weiss (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (GORA-392) Move PersistentSerialization to the top of serializations list
Date Tue, 28 Oct 2014 17:12:33 GMT

     [ https://issues.apache.org/jira/browse/GORA-392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Sergey Weiss updated GORA-392:
    Comment: was deleted

(was: --- gora-core/src/main/java/org/apache/gora/mapreduce/GoraMapReduceUtils.java
+++ gora-core/src/main/java/org/apache/gora/mapreduce/GoraMapReduceUtils.java
@@ -57,14 +57,14 @@ public class GoraMapReduceUtils {
    * @param reuseObjects boolean parameter to reuse objects
   public static void setIOSerializations(Configuration conf, boolean reuseObjects) {
-    String serializationClass =
-      PersistentSerialization.class.getCanonicalName();
     String[] serializations = StringUtils.joinStringArrays(
-        conf.getStrings("io.serializations"), 
+        conf.getStrings("io.serializations"),
-        StringSerialization.class.getCanonicalName(),
-        serializationClass); 
-    conf.setStrings("io.serializations", serializations);
+        StringSerialization.class.getCanonicalName());
+    String[] extendedSerializations = new String[serializations.length + 1];
+    extendedSerializations[0] = PersistentSerialization.class.getCanonicalName();
+    System.arraycopy(serializations, 0, extendedSerializations, 1, serializations.length);
+    conf.setStrings("io.serializations", extendedSerializations);
   public static List<InputSplit> getSplits(Configuration conf, String inputPath) 

> Move PersistentSerialization to the top of serializations list
> --------------------------------------------------------------
>                 Key: GORA-392
>                 URL: https://issues.apache.org/jira/browse/GORA-392
>             Project: Apache Gora
>          Issue Type: Improvement
>          Components: gora-core
>    Affects Versions: 0.5
>            Reporter: Sergey Weiss
> In a process of making Nutch2 run on Hadoop 2.3.0 + HBase 0.98.1 we encountered java.io.EOFException's
like ones described in this mail thread: http://www.mail-archive.com/user%40nutch.apache.org/msg12644.html
> We applied a patch mentioned there and got our setup running but being very unstable:
it would fail with an ArrayIndexOutOfBounds exception whenever we try to generate a batch
of some 50 or more pages to fetch.
> We investigated the problem and discovered that in working setup of Nutch2 + Hadoop 1.2.0
+ HBase 0.94.14, PersistentDeserializer is used for deserialization during reduce phase, and
not AvroSerialization.AvroDeserializer. The reason for this sudden swap of deserializers lies
in GoraMapReduceUtils#setIOSerializations method. It uses StringUtils.joinStringArrays and
this method uses HashSet under the hood. Two more serializations were added to io.serializations
property in Hadoop 2.3.0 compared to Hadoop 1.2.0 and this results in AvroSpecificSerialization
being placed on top of serializations list.
> After we have patched GoraMapReduceUtils#setIOSerializations, having explicitly set PersistentSerialization
to be the top of the list, we have fixed the problem with instability. Moreover, we don't
even need to patch Avro now, just one simple change in Gora and everything works like a charm!
> So we propose to move PersistentSerialization to the top of serializations list.

This message was sent by Atlassian JIRA

View raw message