spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Commented] (SPARK-16207) order guarantees for DataFrames
Date Fri, 01 Jul 2016 08:21:11 GMT


Sean Owen commented on SPARK-16207:

The problem is I think just about every method doesn't necessarily preserve order, or is not
intended to guarantee it, even if it might in many cases. It may not be useful to copy that
a hundred times, but the point can be made more clearly, to not assume ordering.

Maybe it's just me, but I wouldn't have thought groupBy preserves order. it doesn't in RDBMSes.

Refactoring out a statement on this and linking to it seems reasonable.

> order guarantees for DataFrames
> -------------------------------
>                 Key: SPARK-16207
>                 URL:
>             Project: Spark
>          Issue Type: Documentation
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: Max Moroz
>            Priority: Minor
> There's no clear explanation in the documentation about what guarantees are available
for the preservation of order in DataFrames. Different blogs, SO answers, and posts on course
websites suggest different things. It would be good to provide clarity on this.
> Examples of questions on which I could not find clarification:
> 1) Does groupby() preserve order?
> 2) Does take() preserve order?
> 3) Is DataFrame guaranteed to have the same order of lines as the text file it was read
from? (Or as the json file, etc.)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message