spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Don't understand the numbers on the Storage UI(/storage/rdd/?id=4)
Date Sun, 07 Jun 2015 03:22:18 GMT
I do a word count application with 600M text file, and give the RDD's  StorageLevel as StorageLevel.MEMORY_AND_DISK_2.

I got two questions that I can't explain:
1. The StorageLevel shown on the UI is Disk Serialized 2x Replicated,but I am using StorageLevel.MEMORY_AND_DISK_2,where
is the Memory info?
Storage Level: Disk Serialized 2x Replicated 
Cached Partitions: 20 
Total Partitions: 20 
Memory Size: 107.6 MB 
Disk Size: 277.1 MB             

2. My textfile is 600M,but the memory and Disk size shown above is about 400M total(107.6M
+ 277.1M), and I am using 2 replications, So, in my opinion it should be about 600M * 2, Looks
some compression happens under the scene or something else?

View raw message