spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Mayi <antonym...@yahoo.com.INVALID>
Subject pyspark and and page allocation failures due to memory fragmentation
Date Fri, 30 Jan 2015 19:46:14 GMT
Hi,
When running big mapreduce operation with pyspark (in the particular case using lot of sets
and operations on sets in the map tasks so likely to be allocating and freeing loads of pages)
I eventually get kernel error 'python: page allocation failure: order:10, mode:0x2000d0' plus
very verbose dump which I can reduce to following snippet:
Node 1 Normal: 3601*4kB (UEM) 3159*8kB (UEM) 1669*16kB (UEM) 763*32kB (UEM) 1451*64kB (UEM)
15*128kB (UM) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 185836kB
...SLAB: Unable to allocate memory on node 1 (gfp=0xd0)
cache: size-4194304, object size: 4194304, order: 10
so simply the memory got fragmented and there are no higher order pages. interesting thing
is that there is no error thrown by spark itself - the processing just gets stuck without
any error or anything (only the kernel dmesg explains what happened in the background).
any kernel experts out there with an advice how to avoid this? have tried few vm options but
still no joy.
running spark 1.2.0 (cdh 5.3.0) on kernel 3.8.13
thanks,Antony. 
Mime
View raw message