gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato Javier Marroquín Mogrovejo (JIRA) <j...@apache.org>
Subject [jira] [Commented] (GORA-270) IOUtils static SerializationFactory field
Date Sun, 05 Oct 2014 10:16:34 GMT

    [ https://issues.apache.org/jira/browse/GORA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159497#comment-14159497

Renato Javier Marroquín Mogrovejo commented on GORA-270:

Hi guys,

I am sorry to say that this broke Giraph-Gora integration :(
For using Gora in a MR job, it needs io.serializations to be set with org.apache.hadoop.io.serializer.WritableSerialization,
org.apache.hadoop.io.serializer.JavaSerialization, and it should be only done once (whether
as part of the cluster configuration, or as part of the job). This means that for a Hadoop
job all Gora-related data will be serialized in a specific manner.
By accepting this change, now we need to pass this Hadoop configuration with every single
query object, which doesn't make sense as this is a Hadoop configuration and not a Gora configuration.
This led to brake the integration with Giraph, right now a query object can't be generic,
it has to pass the configuration even though this configuration has already been set for the
whole job.
The configuration object does not contain anything related to Gora and I think that was the
reason why it was static. I think we should revert this, create a test for showing that we
need this, and put it back if needed.

> IOUtils static SerializationFactory field
> -----------------------------------------
>                 Key: GORA-270
>                 URL: https://issues.apache.org/jira/browse/GORA-270
>             Project: Apache Gora
>          Issue Type: Bug
>          Components: gora-core
>            Reporter: Damien Raude-Morvan
>            Assignee: Damien Raude-Morvan
>              Labels: mapreduce
>             Fix For: 0.4
>         Attachments: 0001-GORA-270-remove-static-reference-to-SerializationFac.patch
> (From http://mail-archives.apache.org/mod_mbox/gora-dev/201308.mbox/%3CCAG50ZE_poN4C%2B%2B8t2xLZ3MoJVDMRo6nfW_Wygd%3D%3DeteF3jyLrw%40mail.gmail.com%3E)
> Right now, IOUtils keep a *static* reference to an SerializationFactory
> which is initialized on first call to writeObject() with a Configuration
> instance. Given Configuration is also stored in a static field of same
> class for latter usage.
> But in fact each call to IOUtils.writeObject() can have a different
> Configuration instance than previous one. In my personnal use case, I've
> multiple M/R jobs which use Gora M/R feature to process Persistent object
> but each job can work with a different datastore configuration (for
> instance, name of table/collection/colum family).
> If we keep a static reference to SerializationFactory (and so its
> Configuration reference),
> QueryBase#readFields will then create a DataStore with wrong Configuration
> (ie. using first DataStore/Configuration instead of new one)
> I've started working on this issue, and come up with a possible fix :
> https://github.com/drazzib/gora/compare/apache-gora-0.2.1...ioutils_static_conf
> - remove static SerializationFactory from IOUtils (will recreate it every
> time)
> - in PartitionQueryImpl and QueryBase now send *current* configuration to
> deserialize
> One linked fix, is that gora "drivers" needs to be updated to define
> Configuration instance in PartitionQueryImpl (like this
> https://github.com/drazzib/gora/commit/395f2e2ad50d524f42ecc563104c165fa0fa6f39
> ).

This message was sent by Atlassian JIRA

View raw message