I dint mention anything, so by default it should be MEMORY_AND_DISK right?

My doubt was, between two different experiments, are the RDDs cached in memory need to be unpersisted???
Or it doesnt matter ?


On Fri, Mar 28, 2014 at 1:43 AM, Syed A. Hashmi <shashmi@cloudera.com> wrote:
Which storage scheme are you using? I am guessing it is MEMORY_ONLY. In large datasets, MEMORY_AND_DISK or MEMORY_AND_DISK_SER work better.

You can call unpersist on an RDD to remove it from Cache though.


On Thu, Mar 27, 2014 at 11:57 AM, Sai Prasanna <ansaiprasanna@gmail.com> wrote:
No i am running on 0.8.1.
Yes i am caching a lot, i am benchmarking a simple code in spark where in 512mb, 1g and 2g text files are taken, some basic intermediate operations are done while the intermediate result which will be used in subsequent operations are cached.

I thought that, we need not manually unpersist, if i need to cache something and if cache is found full, automatically space will be created by evacuating the earlier. Do i need to unpersist?

Moreover, if i run several times, will the previously cached RDDs still remain in the cache? If so can i flush them manually out before the next run? [something like complete cache flush]


On Thu, Mar 27, 2014 at 11:16 PM, Andrew Or <andrew@databricks.com> wrote:
Are you caching a lot of RDD's? If so, maybe you should unpersist() the ones that you're not using. Also, if you're on 0.9, make sure spark.shuffle.spill is enabled (which it is by default). This allows your application to spill in-memory content to disk if necessary.

How much memory are you giving to your executors? The default, spark.executor.memory is 512m, which is quite low. Consider raising this. Checking the web UI is a good way to figure out your runtime memory usage.


On Thu, Mar 27, 2014 at 9:22 AM, Ognen Duzlevski <ognen@plainvanillagames.com> wrote:
Look at the tuning guide on Spark's webpage for strategies to cope with this.
I have run into quite a few memory issues like these, some are resolved by changing the StorageLevel strategy and employing things like Kryo, some are solved by specifying the number of tasks to break down a given operation into etc.

Ognen


On 3/27/14, 10:21 AM, Sai Prasanna wrote:
"java.lang.OutOfMemoryError: GC overhead limit exceeded"

What is the problem. The same code, i run, one instance it runs in 8 second, next time it takes really long time, say 300-500 seconds...
I see the logs a lot of GC overhead limit exceeded is seen. What should be done ??

Please can someone throw some light on it ??



--
Sai Prasanna. AN
II M.Tech (CS), SSSIHL

Entire water in the ocean can never sink a ship, Unless it gets inside.
All the pressures of life can never hurt you, Unless you let them in.






--
Sai Prasanna. AN
II M.Tech (CS), SSSIHL

Entire water in the ocean can never sink a ship, Unless it gets inside.
All the pressures of life can never hurt you, Unless you let them in.





--
Sai Prasanna. AN
II M.Tech (CS), SSSIHL

Entire water in the ocean can never sink a ship, Unless it gets inside.
All the pressures of life can never hurt you, Unless you let them in.