Like this:

import org.apache.spark.storage.StorageLevel
val rdd = sc.parallelize(1 to 1000000).persist(StorageLevel.MEMORY_AND_DISK_SER)

Thanks
Best Regards

On Mon, Oct 13, 2014 at 12:50 PM, Chengi Liu <chengi.liu.86@gmail.com> wrote:
Cool.. Thanks.. And one last final question..
conf = SparkConf.set(....).set(...)
matrix = get_data(..)
rdd = sc.parallelize(matrix) # heap error here...
How and where do I set set the storage level.. seems like conf is the wrong place to set this thing up..?? as I get this error:
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalArgumentException: For input string: "StorageLevel.MEMORY_AND_DISK_SER"
?
Thanks for all the help

On Mon, Oct 13, 2014 at 12:15 AM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
like this you can set:

sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300 ")


Thanks
Best Regards

On Mon, Oct 13, 2014 at 12:36 PM, Chengi Liu <chengi.liu.86@gmail.com> wrote:
Hi Akhil,
  Thanks for the response..
Another query... do you know how to use "spark.executor.extraJavaOptions" option?
SparkConf.set("spark.executor.extraJavaOptions","what value should go in here")?
I am trying to find an example but cannot seem to find the same.. 


On Mon, Oct 13, 2014 at 12:03 AM, Akhil Das <akhil@sigmoidanalytics.com> wrote:
Few things to keep in mind:
- I believe Driver memory should not exceed executor memory
- Set spark.storage.memoryFraction default is 0.6
- Set spark.rdd.compress default is set to false
- Always specify the level of parallelism while doing a groupBy, reduceBy, join, sortBy etc.
- If you don't have enough memory and the data is huge, then set the Storage level to DISK_AND_MEMORY_SER

More you can read over here.

Thanks
Best Regards

On Sun, Oct 12, 2014 at 10:28 PM, Chengi Liu <chengi.liu.86@gmail.com> wrote:
Hi,
  I am trying to use spark but I am having hard time configuring the sparkconf...
My current conf is 
conf = SparkConf().set("spark.executor.memory","10g").set("spark.akka.frameSize", "100000000").set("spark.driver.memory","16g")

but I still see the java heap size error
14/10/12 09:54:50 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:296)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:332)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:34)
at com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:21)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
at org.apache.spark.serializer.KryoDeserializationStream.readO


Whats the right way to turn these knobs and what other knobs I can play with.
Thanks