spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Šenkýř <mike.sen...@gmail.com>
Subject Re: What is missing here to use sql in spark?
Date Mon, 02 Jan 2017 07:23:16 GMT
Happy new year, Raymond!

Not sure whether I undestand your problem correctly but it seems to me 
that you are just not processing your result.
sqlContext.sql(...) returns a DataFrame which you have to call an action on.

Therefore, to get the result you are expecting, you just have to call:
sqlContext.sql(...).show()

You can also assign it to a variable or register it as a new table 
(view) to work with it further:
df2 = sqlContext.sql(...)
or:
sqlContext.sql(...).createOrReplaceTempView("flight201601_carriers")

Regards,

Michal Šenkýř


On 2.1.2017 05:22, Raymond Xie wrote:
> Happy new year!
>
> Below is my script:
>
> pyspark --packages com.databricks:spark-csv_2.10:1.4.0
> from pyspark.sql import SQLContext
> sqlContext = SQLContext(sc)
> df = 
> sqlContext.read.format('com.databricks.spark.csv').options(header='true', 
> inferschema='true').load('file:///root/Downloads/data/flight201601short2.csv')
> df.show(5)
> df.registerTempTable("flight201601")
> sqlContext.sql("select distinct CARRIER from flight201601")
>
> df.show(5) is below:
>
> +----+-------+-----+------------+-----------+----------+--------------+----------+-------+--------+------+
> |YEAR|QUARTER|MONTH|DAY_OF_MONTH|DAY_OF_WEEK|   
> FL_DATE|UNIQUE_CARRIER|AIRLINE_ID|CARRIER|TAIL_NUM|FL_NUM|
> +----+-------+-----+------------+-----------+----------+--------------+----------+-------+--------+------+
> |2016|      1|    1|           6|  3|2016-01-06|            AA|     
> 19805|     AA|  N4YBAA|    43|
> |2016|      1|    1|           7|  4|2016-01-07|            AA|     
> 19805|     AA|  N434AA|    43|
> |2016|      1|    1|           8|  5|2016-01-08|            AA|     
> 19805|     AA|  N541AA|    43|
> |2016|      1|    1|           9|  6|2016-01-09|            AA|     
> 19805|     AA|  N489AA|    43|
> |2016|      1|    1|          10|  7|2016-01-10|            AA|     
> 19805|     AA|  N439AA|    43|
> +----+-------+-----+------------+-----------+----------+--------------+----------+-------+--------+------+
>
> The final result is NOT what I am expecting, it currently shows the 
> following:
>
> >>> sqlContext.sql("select distinct CARRIER from flight201601")
> DataFrame[CARRIER: string]
>
> I am expecting the distinct CARRIER will be created:
>
> AA
> BB
> CC
> ...
>
> flight201601short2.csv is attached here for your reference.
>
>
> Thank you very much.
>
>
>
> /------------------------------------------------//
> /
> /Sincerely yours,/
>
>
> /Raymond/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message