Yes, It seems like that CMS is better. I have tried G1 as databricks' blog recommended, but it's too slow.

The follwing are the parameters:

bq. It happens during the Reduce majority.

Did the above refer to reduce operation ?

Can you share your G1GC parameters (and heap size for workers) ?


my spark application failed due to take too much time during GC. Looking at the logs I found these things:
1.there are Young GC takes too much time,and not found Full GC happen this;
2.The time takes too much during the object copy;
3.It happened  more easily when there were not enough resources;
4.It happens during the Reduce majority.

have anyone met the same question?

