spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Irakli Machabeli (JIRA)" <>
Subject [jira] [Created] (SPARK-12467) Get rid of sorting in Row's constructor in pyspark
Date Mon, 21 Dec 2015 19:53:46 GMT
Irakli Machabeli created SPARK-12467:

             Summary: Get rid of sorting in Row's constructor in pyspark
                 Key: SPARK-12467
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 1.5.2
            Reporter: Irakli Machabeli
            Priority: Minor

Current implementation of Row's __new__ sorts columns by name
First of all there is no obvious reason to sort, second, if one converts dataframe to rdd
and than back to dataframe, order of column changes. While this is not  a bug, nevetheless
it makes looking at the data really inconvenient.

    def __new__(self, *args, **kwargs):
        if args and kwargs:
            raise ValueError("Can not use both args "
                             "and kwargs to create Row")
        if args:
            # create row class or objects
            return tuple.__new__(self, args)

        elif kwargs:
            # create row objects
            names = sorted(kwargs.keys()) # just get rid of sorting here!!!
            row = tuple.__new__(self, [kwargs[n] for n in names])
            row.__fields__ = names
            return row

            raise ValueError("No args or kwargs")

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message