spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject 回复: [PySpark DataFrame] When a Row is not a Row
Date Wed, 13 May 2015 02:19:28 GMT
The class (called Row) for rows from Spark SQL is created on the fly, is different from pyspark.sql.Row
(is an public API to create Row by users).  

The reason we done it in this way is that we want to have better performance when accessing
the columns. Basically, the rows are just named tuples (called `Row`).  

--  
Davies Liu
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

已使用 Sparrow (http://www.sparrowmailapp.com/?sig)  

在 2015年5月12日 星期二,上午4:49,Nicholas Chammas 写道:

> This is really strange.
>  
> > > > # Spark 1.3.1
> > > > print type(results)
> > > >  
> > >  
> >  
>  
> <class 'pyspark.sql.dataframe.DataFrame'>
>  
> > > > a = results.take(1)[0]
>  
> > > > print type(a)
> <class 'pyspark.sql.types.Row'>
>  
> > > > print pyspark.sql.types.Row
> <class 'pyspark.sql.types.Row'>
>  
> > > > print type(a) == pyspark.sql.types.Row
> False
> > > > print isinstance(a, pyspark.sql.types.Row)
> > >  
> >  
>  
> False
>  
> If I set a as follows, then the type checks pass fine.
>  
> a = pyspark.sql.types.Row('name')('Nick')
>  
> Is this a bug? What can I do to narrow down the source?
>  
> results is a massive DataFrame of spark-perf results.
>  
> Nick
> ​
>  
>  



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message