giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-249) Move part of the graph out-of-core when memory is low
Date Thu, 12 Jul 2012 13:44:34 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alessandro Presta updated GIRAPH-249:
-------------------------------------

    Attachment: GIRAPH-249.patch

This is a first stab.
I replaced the HashMap that stores partitions in a worker with a WorkerPartitionMap.
A WorkerPartitionMap has a normal in-memory map, and the ability to store entire partitions
to the local FS when memory is low.

In order to provide the normal views of the contents of a map, we operate lazily by loading
the out-of-core partitions as we iterate.
We always add the requested partition to the in-memory map (moving another one to disk to
make room) in order to allow modification.

The option "giraph.outOfCoreGraph" controls whether we use WorkerPartitionMap or a normal
HashMap as before.
"giraph.minFreeMemoryRatio" controls how much free memory we want to preserve out of the maximum
available memory for the program.
If out-of-core is enabled and the memory limit is exceeded, we start spilling partitions to
disk.

A few remarks:
- we may want to change other logic (e.g. mutations and message assignment) in order to minimize
the number of times we iterate over the partitions. For example, we might group operations
by partition, and interleave message assignment with computatation. This will become irrelevant
once we also have out-of-core messages (they will most likely be stored outside of the vertices).
- The code to determine if we're low on memory is kind of spaghetti. I'm not sure whether
I should check maxMemory or totalMemory, and whether/when it's a good idea to run the GC.
This logic will get more complex when we add out-of-core messages, since the two will somehow
compete for the available memory, and we want to make sure we make the best use of it.
- We might also have to change the input splitting phase. I think currently we send partitions
over as soon as they reach the max number of vertices. It looks like we keep only one partition
per owner, so this may not present a problem (as long as we have several more partitions than
workers).
                
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
>                 Key: GIRAPH-249
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-249
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping the whole
graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of memory, while
gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate issue, although
the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job (albeit slowly)
instead of failing when the graph is too big, while still encouraging memory optimizations
and high-memory clusters; or restructuring Giraph to be as efficient as possible in disk mode,
making it almost a standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message