Hi,

When renaming a DataFrame column, it looks like Spark is forgetting the partition information:

    Seq((1, 2))
      .toDF("a", "b")
      .repartition($"b")
      .withColumnRenamed("b", "c")
      .repartition($"c")
      .explain()


Gives the following plan:

    == Physical Plan ==
    Exchange hashpartitioning(c#40, 10)
    +- *(1) Project [a#36, b#37 AS c#40]
       +- Exchange hashpartitioning(b#37, 10)
          +- LocalTableScan [a#36, b#37]


As you can see, two shuffles are done, but the second one is unnecessary.
Is there a reason I don't know for this behavior ? Is there a way to work around it (other than not renaming my columns) ?

I'm using Spark 2.4.3.


Thanks for your help,

Antoine