spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean Georges Perrin <jper...@lumeris.com>
Subject Re: Regarding column partitioning IDs and names as per hierarchical level SparkSQL
Date Fri, 03 Nov 2017 10:48:55 GMT
Write a UDF?

> On Oct 31, 2017, at 11:48, Aakash Basu <aakash.spark.raj@gmail.com <mailto:aakash.spark.raj@gmail.com>>
wrote:
> 
> Hey all,
> 
> Any help in the below please?
> 
> Thanks,
> Aakash.
> 
> 
> ---------- Forwarded message ----------
> From: Aakash Basu <aakash.spark.raj@gmail.com <mailto:aakash.spark.raj@gmail.com>>
> Date: Tue, Oct 31, 2017 at 9:17 PM
> Subject: Regarding column partitioning IDs and names as per hierarchical level SparkSQL
> To: user <user@spark.apache.org <mailto:user@spark.apache.org>>
> 
> 
> Hi all,
> 
> I have to generate a table with Spark-SQL with the following columns -
> 
> 
> Level One Id: VARCHAR(20) NULL
> Level One Name: VARCHAR( 50) NOT NULL
> Level Two Id: VARCHAR( 20) NULL
> Level Two Name: VARCHAR(50) NULL
> Level Thr ee Id: VARCHAR(20) NULL
> Level Thr ee Name: VARCHAR(50) NULL
> Level Four Id: VARCHAR(20) NULL
> Level Four Name: VARCHAR( 50) NULL
> Level Five Id: VARCHAR(20) NULL
> Level Five Name: VARCHAR(50) NULL
> Level Six Id: VARCHAR(20) NULL
> Level Six Name: VARCHAR(50) NULL
> Level Seven Id: VARCHAR( 20) NULL
> Level Seven Name: VARCHAR(50) NULL
> Level Eight Id: VARCHAR( 20) NULL
> Level Eight Name: VARCHAR(50) NULL
> Level Nine Id: VARCHAR(20) NULL
> Level Nine Name: VARCHAR( 50) NULL
> Level Ten Id: VARCHAR(20) NULL
> Level Ten Name: VARCHAR(50) NULL
> 
> My input source has these columns -
> 
> 
> ID	Description	ParentID
> 10	Great-Grandfather	 
> 1010	Grandfather	10
> 101010	1. Father A	1010
> 101011	2. Father B	1010
> 101012	4. Father C	1010
> 101013	5. Father D	1010
> 101015	3. Father E	1010
> 101018	Father F	1010
> 101019	6. Father G	1010
> 101020	Father H	1010
> 101021	Father I	1010
> 101022	2A. Father J	1010
> 10101010	2. Father K	101010
> 
> Like the above, I have ID till 20 digits, which means, I have 10 levels.
> 
> I want to populate the ID and name itself along with all the parents till the root for
any particular level, which I am unable to create a concrete logic for.
> 
> Am using this way to fetch respecting levels and populate them in the respective columns
but not their parents -
> 
> Present Logic ->
> 
> FinalJoin_DF = spark.sql("select "
>                           + "case when length(a.id <http://a.id/>)/2 = '1' then
a.id <http://a.id/> else ' ' end as level_one_id, "
> 			  + "case when length(a.id <http://a.id/>)/2 = '1' then a.desc else ' ' end
as level_one_name, "
> 			  + "case when length(a.id <http://a.id/>)/2 = '2' then a.id <http://a.id/>
else ' ' end as level_two_id, "
> 			  + "case when length(a.id <http://a.id/>)/2 = '2' then a.desc else ' ' end
as level_two_name, "
>                           + "case when length(a.id <http://a.id/>)/2 = '3' then
a.id <http://a.id/> else ' ' end as level_three_id, "
>                           + "case when length(a.id <http://a.id/>)/2 = '3' then
a.desc else ' ' end as level_three_name, "
>                           + "case when length(a.id <http://a.id/>)/2 = '4' then
a.id <http://a.id/> else ' ' end as level_four_id, "
>                           + "case when length(a.id <http://a.id/>)/2 = '4' then
a.desc else ' ' end as level_four_name, "
>                           + "case when length(a.id <http://a.id/>)/2 = '5' then
a.id <http://a.id/> else ' ' end as level_five_id, "
>                           + "case when length(a.id <http://a.id/>)/2 = '5' then
a.desc else ' ' end as level_five_name, "
>                           + "case when length(a.id <http://a.id/>)/2 = '6' then
a.id <http://a.id/> else ' ' end as level_six_id, "
> 			  + "case when length(a.id <http://a.id/>)/2 = '6' then a.desc else ' ' end
as level_six_name, "
> 			  + "case when length(a.id <http://a.id/>)/2 = '7' then a.id <http://a.id/>
else ' ' end as level_seven_id, "
>                           + "case when length(a.id <http://a.id/>)/2 = '7' then
a.desc else ' ' end as level_seven_name, "
>                           + "case when length(a.id <http://a.id/>)/2 = '8' then
a.id <http://a.id/> else ' ' end as level_eight_id, "
> 			  + "case when length(a.id <http://a.id/>)/2 = '8' then a.desc else ' ' end
as level_eight_name, "
>                           + "case when length(a.id <http://a.id/>)/2 = '9' then
a.id <http://a.id/> else ' ' end as level_nine_id, "
> 			  + "case when length(a.id <http://a.id/>)/2 = '9' then a.desc else ' ' end
as level_nine_name, "
> 			  + "case when length(a.id <http://a.id/>)/2 = '10' then a.id <http://a.id/>
else ' ' end as level_ten_id, "
>                           + "case when length(a.id <http://a.id/>)/2 = '10' then
a.desc else ' ' end as level_ten_name "
> 			  + "from CategoryTempTable a")
> 
> 
> Can someone help me in also populating all the parents levels in the respective level
ID and level name, please?
> 
> 
> Thanks,
> Aakash.
> 


Mime
View raw message