spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject DataFrame#show cost 2 Spark Jobs ?
Date Mon, 24 Aug 2015 10:19:48 GMT
It's weird to me that the simple show function will cost 2 spark jobs.
DataFrame#explain shows it is a very simple operation, not sure why need 2
jobs.

== Parsed Logical Plan ==
Relation[age#0L,name#1]
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json]

== Analyzed Logical Plan ==
age: bigint, name: string
Relation[age#0L,name#1]
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json]

== Optimized Logical Plan ==
Relation[age#0L,name#1]
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json]

== Physical Plan ==
Scan
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json][age#0L,name#1]



-- 
Best Regards

Jeff Zhang

Mime
View raw message