spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Kizhakkel Jose <>
Subject How to modify a field in a nested struct using pyspark
Date Fri, 29 Jan 2021 16:30:53 GMT
Hello All,

I am using pyspark structured streaming and I am getting timestamp fields
as plain long (milliseconds), so I have to modify these fields into a
timestamp type

a sample json object object:

      "value": "f40b2e22-4003-4d90-afd3-557bc013b05e",
      "type": "UUID",
      "system": "Test"
  "status": "Active",
  "timingPeriod": {
    "startDateTime": 1611859271516,
    "endDateTime": null
  "eventDateTime": 1611859272122,
  "isPrimary": true,

  Here I want to convert "eventDateTime" and "startDateTime" and
"endDateTime" as timestamp types

So I have done following,

def transform_date_col(date_col):
    return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000)


the timingPeriod fields are not a struct anymore rather they become two
different fields with names "timingPeriod.start", "timingPeriod.end".

How can I get them as a struct as before?
Is there a generic way I can modify a single/multiple properties of nested

I have hundreds of entities where the long needs to convert to timestamp,
so a generic implementation will help my data ingestion pipeline a lot.

Felix K Jose

View raw message