spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Stojanov <m...@danielstojanov.com>
Subject Re: Confuse on Spark to_date function
Date Thu, 05 Nov 2020 10:54:43 GMT

On 5/11/20 2:48 pm, 杨仲鲍 wrote:
>
> Code
>
> ```scala
> object Suit{ case class Data(node:String,root:String) def apply[A](xs:A *):List[A] =
xs.toList
>    def main(args: Array[String]): Unit ={ val spark = SparkSession.builder() .master("local")
.appName("MoneyBackTest") .getOrCreate() import spark.implicits._
>      spark.sql("select to_date('2020-01-01 20:00:00','yyyy-MM-dd HH:mm:ss')").show(false)
} }
> ```
>
> result
>
> ```output
>
> +-----------------------------------------------------+
> |to_date('2020-01-01 20:00:00', 'yyyy-MM-dd HH:mm:ss')|
> +-----------------------------------------------------+
> |2020-01-01       |
> +-----------------------------------------------------+
> ```
>
> Why not show 2020-01-01 20:00:00
>
> sparkVersion:2.4.4
> Device:MacBook

You want to_timestamp instead of to_date.

The following is in Python, but I think you should be able to follow.

 >>> row = psq.Row(as_string="2020-01-01 12:01:02")

 >>> df = spark.sparkContext.parallelize([row]).toDF()

 >>> import pyspark.sql.functions as F

 >>> df.withColumn("date_converted", F.to_date(F.column("as_string"), 
"yyyy-MM-dd HH:mm:ss")).show()

+-------------------+--------------+

| as_string|date_converted|

+-------------------+--------------+

|2020-01-01 12:01:02| 2020-01-01|

+-------------------+--------------+

 >>> df.withColumn("date_converted", 
F.to_timestamp(F.column("as_string"), "yyyy-MM-dd HH:mm:ss")).show()

+-------------------+-------------------+

| as_string| date_converted|

+-------------------+-------------------+

|2020-01-01 12:01:02|2020-01-01 12:01:02|

+-------------------+-------------------+


Mime
View raw message