spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrico Minack <m...@Enrico.Minack.dev>
Subject Re: Compute the Hash of each row in new column
Date Mon, 02 Mar 2020 10:21:14 GMT
Well, then apply md5 on all columns:

ds.select(ds.columns.map(col) ++ ds.columns.map(column => 
md5(col(column)).as(s"$column hash")): _*).show(false)

Enrico

Am 02.03.20 um 11:10 schrieb Chetan Khatri:
> Thanks Enrico
> I want to compute hash of all the columns value in the row.
>
> On Fri, Feb 28, 2020 at 7:28 PM Enrico Minack <mail@enrico.minack.dev 
> <mailto:mail@enrico.minack.dev>> wrote:
>
>     This computes the md5 hash of a given column id of Dataset ds:
>
>     ds.withColumn("id hash", md5($"id")).show(false)
>
>     Test with this Dataset ds:
>
>     import org.apache.spark.sql.types._
>     val ds = spark.range(10).select($"id".cast(StringType))
>
>     Available are md5, sha, sha1, sha2 and hash:
>     https://spark.apache.org/docs/2.4.5/api/sql/index.html
>
>     Enrico
>
>
>     Am 28.02.20 um 13:56 schrieb Chetan Khatri:
>     > Hi Spark Users,
>     > How can I compute Hash of each row and store in new column at
>     > Dataframe, could someone help me.
>     >
>     > Thanks
>
>


Mime
View raw message