spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Kumar Prusty <Rohit_Pru...@infosys.com>
Subject After calling persist, why the size in sparkui is not matching with the actual file size
Date Mon, 29 Aug 2016 13:52:30 GMT
Hi Team,
I am new to spark and have this basic question. After calling persist, why the size in sparkui
is not matching with the actual file size?

Actaul File Size for "/user/rohit_prusty/application2.log" - 39 KB

Code snippet:
===========
logData = sc.textFile("/user/rohit_prusty/application2.log")
logData.persist()
logData.count()
errors = logData.filter(lambda line: "ERROR" in line)
errors.persist()
errors.count()

Output in SparkUI
==============
logData RDD takes 2.1 KB
errors RDD takes 1.3 KB

Regards
Rohit Kumar Prusty
+91-9884070075


Mime
View raw message