Hi,

I wanted to understand what kind of memory overheads are expected if at all while using the Java API. My application seems to have a lot of live Tuple2 instances and I am hitting a lot of gc so I am wondering if I am doing something fundamentally wrong. Here is what the top of my heap looks like. I actually create reifier.tuple.Tuple objects and pass them to map methods and mostly return Tuple2<Tuple,Tuple>. The heap seems to have far too many Tuple2 and $colon$colon.


num     #instances         #bytes  class name
----------------------------------------------
   1:      85414872     2049956928  scala.collection.immutable.$colon$colon
   2:      85414852     2049956448  scala.Tuple2
   3:        304221       14765832  [C
   4:        302923        7270152  java.lang.String
   5:         44111        2624624  [Ljava.lang.Object;
   6:          1210        1495256  [B
   7:         39839         956136  java.util.ArrayList
   8:            29         950736  [Lscala.concurrent.forkjoin.ForkJoinTask;
   9:          8129         827792  java.lang.Class
  10:         33839         812136  java.lang.Long
  11:         33400         801600  reifier.tuple.Tuple
  12:          6116         538208  java.lang.reflect.Method
  13:         12767         408544  java.util.concurrent.ConcurrentHashMap$Node
  14:          5994         383616  org.apache.spark.scheduler.ResultTask
  15:         10298         329536  java.util.HashMap$Node
  16:         11988         287712  org.apache.spark.rdd.NarrowCoGroupSplitDep
  17:          5708         228320  reifier.block.Canopy
  18:             9         215784  [Lscala.collection.Seq;
  19:         12078         193248  java.lang.Integer
  20:         12019         192304  java.lang.Object
  21:          5708         182656  reifier.block.Tree
  22:          2776         173152  [I
  23:          6013         144312  scala.collection.mutable.ArrayBuffer
  24:          5994         143856  [Lorg.apache.spark.rdd.CoGroupSplitDep;
  25:          5994         143856  org.apache.spark.rdd.CoGroupPartition
  26:          5994         143856  org.apache.spark.rdd.ShuffledRDDPartition
  27:          4486         143552  java.util.Hashtable$Entry
  28:          6284         132800  [Ljava.lang.Class;
  29:          1819         130968  java.lang.reflect.Field
  30:           605         101208  [Ljava.util.HashMap$Node;



Best Regards,
Sonal
Nube Technologies