Thank you, I will try that. However if I set bsp.local.tasks.maximum to 1, why doesn't it distribute
one task to each machine?
On Dec 5, 2012, at 11:58 AM, Thomas Jungblut wrote:
> So it will spawn 12 tasks. If this doesn't satisfy the load on your
> machines, try to use smaller blocksizes.
>
> 2012/12/5 Benedikt Elser <elser@disi.unitn.it>
>
>> Hi,
>>
>> thanks for your reply!
>>
>> Total size: 49078776 B
>> Total dirs: 1
>> Total files: 12
>> Total blocks (validated): 12 (avg. block size 4089898 B)
>>
>> Benedikt
>>
>> On Dec 5, 2012, at 11:47 AM, Thomas Jungblut wrote:
>>
>>> So how many blocks has your data in HDFS?
>>>
>>> 2012/12/5 Benedikt Elser <elser@disi.unitn.it>
>>>
>>>> Hi List,
>>>>
>>>> I am using the hama-0.6.0 release to run graph jobs on various input
>>>> graphs in a ec2 based cluster of size 12. However as I see in the logs
>> not
>>>> every node on the cluster contributes to that job (they have no
>>>> tasklog/job<ID> dir and are idle). Theoretically a distribution of
1
>>>> Million nodes across 12 buckets should hit every node at least once.
>>>> Therefore I think its a configuration problem. So far I messed around
>> with
>>>> these settings:
>>>>
>>>> <name>bsp.max.tasks.per.job</name>
>>>> <name>bsp.local.tasks.maximum</name>
>>>> <name>bsp.tasks.maximum</name>
>>>> <name>bsp.child.java.opts</name>
>>>>
>>>> Setting bsp.local.tasks.maximum to 1 and bsp.tasks.maximum.per.job to 12
>>>> hat not the desired effect. I also split the input into 12 files
>> (because
>>>> of something in 0.5, that was fixed in 0.6).
>>>>
>>>> Could you recommend me some settings or guide me through the system's
>>>> partition decision? I thought it would be:
>>>>
>>>> Input -> Input Split based on input, max* conf values -> number of
tasks
>>>> HashPartition.class distributes Ids across that number of tasks.
>>>>
>>>> Thanks,
>>>>
>>>> Benedikt
>>
>>
|