giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianlong Zhong (JIRA)" <>
Subject [jira] [Created] (GIRAPH-1161) implement random sampling for input splits
Date Thu, 28 Sep 2017 17:49:00 GMT
Jianlong Zhong created GIRAPH-1161:

             Summary: implement random sampling for input splits
                 Key: GIRAPH-1161
             Project: Giraph
          Issue Type: Improvement
            Reporter: Jianlong Zhong
            Priority: Minor

Currently if we are reading vertex/edge data from multiple tables, and we only want to read
a fraction of data (with giraph.inputSplitSamplePercent conf option), we'll always get the
first inputSplitSamplePercent of the input slits. We should instead use a random sample of
input splits so testing on sample of data would look closer to actual full data run.

This message was sent by Atlassian JIRA

View raw message