spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@hacked.work>
Subject Re: How is data desensitization (example: select bank_no from users)?
Date Thu, 24 Aug 2017 10:19:46 GMT
Usually analysts will not have access to data stored in the PCI Zone, you
could write the data out to a table for the analysts by masking the
sensitive information.

Eg:


> val mask_udf = udf((info: String) => info.patch(0, "*" * 12, 7))
> val df = sc.parallelize(Seq(("user1", "400-000-444"))).toDF("user", "sensitive_info")
> df.show

+-----+--------------+
| user|sensitive_info|
+-----+--------------+
|user1|   400-000-444|
+-----+--------------+

> df.withColumn("sensitive_info", mask_udf($"sensitive_info")).show

+-----+----------------+
| user|  sensitive_info|
+-----+----------------+
|user1|************-444|
+-----+----------------+


On Sat, Aug 19, 2017 at 10:42 PM, 李斌松 <libinsong1204@gmail.com> wrote:

> For example, the user's bank card number cannot be viewed by an analyst
> and replaced by an asterisk. How do you do that in spark?
>



-- 
Cheers!

Mime
View raw message