spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Max Moroz (JIRA)" <>
Subject [jira] [Commented] (SPARK-16207) order guarantees for DataFrames
Date Fri, 01 Jul 2016 08:02:11 GMT


Max Moroz commented on SPARK-16207:

Would it not be easier (if only for maintenance reasons) to write this paragraph just once,
and simply include a reference to it (with a hyperlink or otherwise) from those methods that
depend on order?

Also, it would be important to say which methods preserve order and which don't. I did my
best to describe it (by referring to methods that "don't involve grouping or sorting"), but
that's rather vague.

Maybe I'm exaggerating the importance of this point, but for someone like me, who's new to
Spark, it's very hard to figure out whether a DF kept or lost the ordering created by a previously
executed orderBy. For instance, I never found anything in the docs to say that groupBy may
not preserve ordering, so I'm glad you mentioned it.

> order guarantees for DataFrames
> -------------------------------
>                 Key: SPARK-16207
>                 URL:
>             Project: Spark
>          Issue Type: Documentation
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Max Moroz
>            Priority: Minor
> There's no clear explanation in the documentation about what guarantees are available
for the preservation of order in DataFrames. Different blogs, SO answers, and posts on course
websites suggest different things. It would be good to provide clarity on this.
> Examples of questions on which I could not find clarification:
> 1) Does groupby() preserve order?
> 2) Does take() preserve order?
> 3) Is DataFrame guaranteed to have the same order of lines as the text file it was read
from? (Or as the json file, etc.)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message