spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liquan Pei <liquan...@gmail.com>
Subject Re: SparkSQL LEFT JOIN problem
Date Fri, 10 Oct 2014 16:49:04 GMT
Hi

Can you try
select birthday from customer left join profile on customer.account_id =
profile.account_id
to see if the problems remains on your entire data?

Thanks,
Liquan

On Fri, Oct 10, 2014 at 8:20 AM, invkrh <invkrh@gmail.com> wrote:

> Hi,
>
> I am exploring SparkSQL 1.1.0, I have a problem on LEFT JOIN.
>
> Here is the request:
>
> select * from customer left join profile on customer.account_id =
> profile.account_id
>
> The two tables' schema are shown as following:
>
> // Table: customer
> root
>  |-- account_id: string (nullable = false)
>  |-- birthday: string (nullable = true)
>  |-- preferstore: string (nullable = true)
>  |-- registstore: string (nullable = true)
>  |-- gender: string (nullable = true)
>  |-- city_name_en: string (nullable = true)
>  |-- register_date: string (nullable = true)
>  |-- zip: string (nullable = true)
>
> // Table: profile
> root
>  |-- account_id: string (nullable = false)
>  |-- card_type: string (nullable = true)
>  |-- card_upgrade_time_black: string (nullable = true)
>  |-- card_upgrade_time_gold: string (nullable = true)
>
> However, I have always an exception:
>
> Exception in thread "main"
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
> attributes: *, tree:
> Project [*]
>  Join LeftOuter, Some(('customer.account_id = 'profile.account_id))
>   Subquery customer
>    SparkLogicalPlan (ExistingRdd
>
> [account_id#0,birthday#1,preferstore#2,registstore#3,gender#4,city_name_en#5,register_date#6,zip#7],
> MappedRDD[5] at map at SQLFetcher.scala:43)
>   Subquery profile
>    SparkLogicalPlan (ExistingRdd
>
> [account_id#8,card_type#9,card_upgrade_time_black#10,card_upgrade_time_gold#11],
> MappedRDD[12] at map at SQLFetcher.scala:43)
>
> I was not sure where the problem is. So I create two simple tables to
> isolate the problem.
>
> // table 1
> a       b       c
> 4       8       9
> 1       3       4
> 3       4       5
>
> // table 2
> a       b       c
> 1       2       3
> 4       5       6
>
> This time, it works.
>
> So the problem might be in data. I have just sampled some lines of input
> tables to create new ones.
> This also works.
>
> I am so confused. The problem is in the data, but the error messages are
> not
> enough to find it (if I am not missing anything.)
>
> Some lines of the sampled tables.
>
> // Table: customer
>
> [50660,1975-06-05 00:00:00.000,13,12,male,ningboshi,2006-12-14
> 00:00:00.000,]
> [50666,1984-02-23 00:00:00.000,72,5,Female,beijingshi,2006-12-14
> 00:00:00.000,100086]
> [50680,1976-11-25 00:00:00.000,59,5,Female,beijingshi,2006-12-14
> 00:00:00.000,100022]
> [85,1971-03-27 00:00:00.000,2,2,Female,shanghaishi,2005-09-20
> 00:00:00.000,200336]
>
>
> // Table: profile
>
> [1144681,3,2010-02-18 00:00:00.000,2013-02-28 00:00:00.000]
> [50666,2,2010-10-31 00:00:00.000,]
> [3930657,1,,]
> [1056365,2,2009-12-29 00:00:00.000,]
>
> Any help ? =)
>
> Hao
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-LEFT-JOIN-problem-tp16152.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst

Mime
View raw message