giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hassan Eslami (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GIRAPH-1073) Decouple out-of-core persistence infrastructure from out-of-core computation
Date Wed, 15 Jun 2016 18:27:09 GMT
Hassan Eslami created GIRAPH-1073:
-------------------------------------

             Summary: Decouple out-of-core persistence infrastructure from out-of-core computation
                 Key: GIRAPH-1073
                 URL: https://issues.apache.org/jira/browse/GIRAPH-1073
             Project: Giraph
          Issue Type: Improvement
            Reporter: Hassan Eslami
            Assignee: Hassan Eslami


In the current out-of-core infrastructure, the persistence layer is heavily intertwined with
the scheduling and out-of-core engine. This makes it complicated to try new features for the
persistence layer. The following changes are needed:
 * The persistence layer should be decoupled from out-of-core infrastructure. This way one
can simply implement and plug different data accessors for various persistence resources,
e.g. local file system data accessor, HDFS data accessor, serialized in-memory data accessor,
etc.
 * We should be able to address out-of-core data in a more efficient and flexible way. Currently,
data are accessed/addressed through string literals in various locations of the code. This
should be changed so data can be accessed through a unified, more flexible data indexing mechanism.
 * With different implementations of data accessor, now there may be more emphasis on having
more IO threads. It is important that these IO threads are load-balanced. Currently, partitions
are assigned to IO threads using a hash function. Hash function tent not to balance load with
small number of data points (partitions in this case).
 * Currently, out-of-core uses `BufferedInputStream` and `BufferedOutputStream` along with
the default (de)serialization mechanism. The IO bandwidth achieved in the current implementation
is low. One can simply use: 1) Unsafe (de)serialization mechanism to optimize for memory bandwidth
during (de)serialization process, 2) RandomAccessFile's read and write interface to have lower
level access to the local file system and avoid overheads in reading/writing from/to local
files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message