spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Get size of rdd in memory
Date Mon, 02 Feb 2015 21:18:00 GMT
Actually |SchemaRDD.cache()| behaves exactly the same as |cacheTable| 
since Spark 1.2.0. The reason why your web UI didn’t show you the cached 
table is that both |cacheTable| and |sql("SELECT ...")| are lazy :-) 
Simply add a |.collect()| after the |sql(...)| call.

Cheng

On 2/2/15 12:23 PM, ankits wrote:

> Thanks for your response. So AFAICT
>
> calling parallelize(1  to1024).map(i =>KV(i,
> i.toString)).toSchemaRDD.cache().count(), will allow me to see the size of
> the schemardd in memory
>
> and parallelize(1  to1024).map(i =>KV(i, i.toString)).cache().count()  will
> show me the size of a regular rdd.
>
> But this will not show us the size when using cacheTable() right? Like if i
> call
>
> parallelize(1  to1024).map(i =>KV(i,
> i.toString)).toSchemaRDD.registerTempTable("test")
> sqc.cacheTable("test")
> sqc.sql("SELECT COUNT(*) FROM test")
>
> the web UI does not show us the size of the cached table.
>
>
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Get-size-of-rdd-in-memory-tp10366p10388.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>
​

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message