spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selvam Raman <>
Subject Spark Mlib - java.lang.OutOfMemoryError: Java heap space
Date Mon, 24 Apr 2017 10:22:28 GMT

I have 1 master and 4 slave node. Input data size is 14GB.
Slave Node config : 32GB Ram,16 core

I am trying to train word embedding model using spark. It is going out of
memory. To train 14GB of data how much memory do i require?.

I have givem 20gb per executor but below shows it is using 11.8GB out of 20
BlockManagerInfo: Added broadcast_1_piece0 in memory on
(size: 4.6 KB, free: 11.8 GB)

This is the code
if __name__ == "__main__":
    sc = SparkContext(appName="Word2VecExample")  # SparkContext

    # $example on$
    inp =
sc.textFile("s3://word2vec/data/word2vec_word_data.txt/").map(lambda row:
row.split(" "))

    word2vec = Word2Vec()
    model =, "s3://pysparkml/word2vecresult2/")

Spark-submit Command:
spark-submit --master yarn --conf
-XX:HeapDumpPath=/mnt/tmp -XX:+UseG1GC -XX:+UseG1GC -XX:+PrintFlagsFinal
-XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
-XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark' --num-executors 4
--executor-cores 2 --executor-memory 20g

Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

View raw message