spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aakash Basu <aakash.spark....@gmail.com>
Subject Regarding column partitioning IDs and names as per hierarchical level SparkSQL
Date Tue, 31 Oct 2017 15:47:40 GMT
Hi all,

I have to generate a table with Spark-SQL with the following columns -


Level One Id: VARCHAR(20) NULL
Level One Name: VARCHAR( 50) NOT NULL
Level Two Id: VARCHAR( 20) NULL
Level Two Name: VARCHAR(50) NULL
Level Thr ee Id: VARCHAR(20) NULL
Level Thr ee Name: VARCHAR(50) NULL
Level Four Id: VARCHAR(20) NULL
Level Four Name: VARCHAR( 50) NULL
Level Five Id: VARCHAR(20) NULL
Level Five Name: VARCHAR(50) NULL
Level Six Id: VARCHAR(20) NULL
Level Six Name: VARCHAR(50) NULL
Level Seven Id: VARCHAR( 20) NULL
Level Seven Name: VARCHAR(50) NULL
Level Eight Id: VARCHAR( 20) NULL
Level Eight Name: VARCHAR(50) NULL
Level Nine Id: VARCHAR(20) NULL
Level Nine Name: VARCHAR( 50) NULL
Level Ten Id: VARCHAR(20) NULL
Level Ten Name: VARCHAR(50) NULL

My input source has these columns -


ID Description ParentID
10 Great-Grandfather
1010 Grandfather 10
101010 1. Father A 1010
101011 2. Father B 1010
101012 4. Father C 1010
101013 5. Father D 1010
101015 3. Father E 1010
101018 Father F 1010
101019 6. Father G 1010
101020 Father H 1010
101021 Father I 1010
101022 2A. Father J 1010
10101010 2. Father K 101010
Like the above, I have ID till 20 digits, which means, I have 10 levels.

I want to populate the ID and name itself along with all the parents till
the root for any particular level, which I am unable to create a concrete
logic for.

Am using this way to fetch respecting levels and populate them in the
respective columns but not their parents -

Present Logic ->

FinalJoin_DF = spark.sql("select "
                          + "case when length(a.id)/2 = '1' then a.id else
' ' end as level_one_id, "
  + "case when length(a.id)/2 = '1' then a.desc else ' ' end as
level_one_name, "
  + "case when length(a.id)/2 = '2' then a.id else ' ' end as level_two_id,
"
  + "case when length(a.id)/2 = '2' then a.desc else ' ' end as
level_two_name, "
                          + "case when length(a.id)/2 = '3' then a.id else
' ' end as level_three_id, "
                          + "case when length(a.id)/2 = '3' then a.desc
else ' ' end as level_three_name, "
                          + "case when length(a.id)/2 = '4' then a.id else
' ' end as level_four_id, "
                          + "case when length(a.id)/2 = '4' then a.desc
else ' ' end as level_four_name, "
                          + "case when length(a.id)/2 = '5' then a.id else
' ' end as level_five_id, "
                          + "case when length(a.id)/2 = '5' then a.desc
else ' ' end as level_five_name, "
                          + "case when length(a.id)/2 = '6' then a.id else
' ' end as level_six_id, "
  + "case when length(a.id)/2 = '6' then a.desc else ' ' end as
level_six_name, "
  + "case when length(a.id)/2 = '7' then a.id else ' ' end as
level_seven_id, "
                          + "case when length(a.id)/2 = '7' then a.desc
else ' ' end as level_seven_name, "
                          + "case when length(a.id)/2 = '8' then a.id else
' ' end as level_eight_id, "
  + "case when length(a.id)/2 = '8' then a.desc else ' ' end as
level_eight_name, "
                          + "case when length(a.id)/2 = '9' then a.id else
' ' end as level_nine_id, "
  + "case when length(a.id)/2 = '9' then a.desc else ' ' end as
level_nine_name, "
  + "case when length(a.id)/2 = '10' then a.id else ' ' end as
level_ten_id, "
                          + "case when length(a.id)/2 = '10' then a.desc
else ' ' end as level_ten_name "
  + "from CategoryTempTable a")


Can someone help me in also populating all the parents levels in the
respective level ID and level name, please?


Thanks,
Aakash.

Mime
View raw message