spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kanchan tewary <>
Subject Re: toDebugString - RDD Logical Plan
Date Tue, 23 Apr 2019 15:48:51 GMT
Hello Dylan,

Thank you for help. The result do look formatted after making the change.
However, from the following code, I was expecting RDD types like MappedRDD
and filteredRDD to be present in the lineage. However, I can only see
PythonRDD and parallelCollectionRDD in the lineage [I am running in local

`sc.parallelize([1,2,3,3]).map(lambda x:x**2).filter(lambda x:x>5).count()`

Note: I also tried setting logLineage property to true, but it did not
yield any additional details in the log.


On Sun, Apr 21, 2019 at 12:11 AM Dylan Guedes <> wrote:

> Kanchan,
> the `toDebugString` looks unformatted because in some scenarios you need
> to parse it before (can't remember the reason, though). I suggest you to
> print the RDD Lineage using
> `print(rdd.toDebugString().decode("utf-8"))` instead (obs: this only
> occurs in Pyspark).
> About the other question, you may use `getNumberPartitions`.
> On Sat, Apr 20, 2019 at 2:40 PM kanchan tewary <>
> wrote:
>> Dear All,
>> Greetings!
>> I am new to Apache Spark and working on RDDs using pyspark. I am trying
>> to understand the logical plan provided by toDebugString function, but I
>> find two issues a) the output is not formatted when I print the result
>> b) I do not see number of partitions shown.
>> Can anyone direct me to any reference documentation to understand the
>> logical plan better? Or, do you suggest to use DAG from spark UI instead?
>> Thanks & Best Regards,
>> Kanchan
>> Data Engineer, IBM

Thanks & Best Regards,

View raw message