spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Xie <xie3208...@gmail.com>
Subject What is missing here to use sql in spark?
Date Mon, 02 Jan 2017 04:22:49 GMT
Happy new year!

Below is my script:

pyspark --packages com.databricks:spark-csv_2.10:1.4.0
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.read.format('com.databricks.spark.csv').options(header='true',
inferschema='true').load('file:///root/Downloads/data/
flight201601short2.csv')
df.show(5)
df.registerTempTable("flight201601")
sqlContext.sql("select distinct CARRIER from flight201601")

df.show(5) is below:

+----+-------+-----+------------+-----------+----------+----
----------+----------+-------+--------+------+
|YEAR|QUARTER|MONTH|DAY_OF_MONTH|DAY_OF_WEEK|   FL_DATE|UNIQUE_CARRIER|
AIRLINE_ID|CARRIER|TAIL_NUM|FL_NUM|
+----+-------+-----+------------+-----------+----------+----
----------+----------+-------+--------+------+
|2016|      1|    1|           6|          3|2016-01-06|            AA|
19805|     AA|  N4YBAA|    43|
|2016|      1|    1|           7|          4|2016-01-07|            AA|
19805|     AA|  N434AA|    43|
|2016|      1|    1|           8|          5|2016-01-08|            AA|
19805|     AA|  N541AA|    43|
|2016|      1|    1|           9|          6|2016-01-09|            AA|
19805|     AA|  N489AA|    43|
|2016|      1|    1|          10|          7|2016-01-10|            AA|
19805|     AA|  N439AA|    43|
+----+-------+-----+------------+-----------+----------+----
----------+----------+-------+--------+------+

The final result is NOT what I am expecting, it currently shows the
following:

>>> sqlContext.sql("select distinct CARRIER from flight201601")
DataFrame[CARRIER: string]

I am expecting the distinct CARRIER will be created:

AA
BB
CC
...

flight201601short2.csv is attached here for your reference.


Thank you very much.



*------------------------------------------------*
*Sincerely yours,*


*Raymond*

Mime
View raw message