Hi Harpreet,

Try to give more resources to the mappers, or increase the number of mappers. I don't think there is a direct relation between the sum of all the mappers' JVM sizes and the input size.

Regards,

Douglas

On Thu, Aug 3, 2017 at 4:26 AM, Harpreet Singh <hs.kundhal@gmail.com> wrote:
Thanks Douglas, 
Details asked are
Yarn.scheduler. minimum-allocation-mb=2gb
Yarn.scheduler. maximum-allocation-mb=128gb
Increment=512 MB

Please help with design considerations about how many mappers should be used for sqoop. I believe that mapper memory is capped so does thus mean that data to be fetched with 6 mappers using 2gb memory is capped around 12 GB. Cluster is precisely following number of mappers specified and not exceeding the task count. 

Regards
Harpreet Singh

On Aug 2, 2017 7:19 PM, "Douglas Spadotto" <dougspadotto@gmail.com> wrote:
Hello Harpreet,

It seems that your job is going beyond the limits established. 

What are the values foyarn.scheduler.minimum-allocation-mb and yarn.scheduler.maximum-allocation-mb on your cluster?

Some background on the meaning of these configurations can be found here: https://discuss.pivotal.io/hc/en-us/articles/201462036-MapReduce-YARN-Memory-Parameters

Regards,

Douglas

On Wed, Aug 2, 2017 at 8:00 AM, Harpreet Singh <hs.kundhal@gmail.com> wrote:
Hi All,
I have a sqoop job which is running in production and fails sometimes. Restart of job executes successfully .
Logs show that failure happens with error that container is running beyond physical memory limits. Current usage 2.3 GB of 2GB physical memory used. 4.0 GB of 4.2 GB virtual memory used. Killing container. 
Environment is
Cdh5.8.3
Sqoop 1 client
Mapreduce.map.Java.opts=-Djava.net.preferIPv4Stack=true -Xmx1717986918
Mapreduce.map.memory.MB= 2GB

Sqoop job details. Pulling data from netezza using 6 mappers and putting into parquet format on hdfs. Data processed is 14 GB. Splits seem to be even. 
Please provide your insights. 

Regards
Harpreet Singh