I am seeing a huge variation on spark Task Deserialization Time for my collect and reduce operations. while most tasks complete within 100ms a few take mote than a couple of seconds which slows the entire program down. I have attached a screen shot of the web ui where you can see the variation
As you can see the Task Deserialization Time time has a Max of 7s and 75th percentile at 0.3 seconds.
Does anyone know the reasons that may cause these kind of numbers. Any help would be greatly appreciated.
Pulasthi S. Wickramasinghe
Graduate Student | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington