spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From invkrh <inv...@gmail.com>
Subject SparkSQL LEFT JOIN problem
Date Fri, 10 Oct 2014 15:20:20 GMT
Hi,

I am exploring SparkSQL 1.1.0, I have a problem on LEFT JOIN.

Here is the request:

select * from customer left join profile on customer.account_id =
profile.account_id

The two tables' schema are shown as following:

// Table: customer
root
 |-- account_id: string (nullable = false)
 |-- birthday: string (nullable = true)
 |-- preferstore: string (nullable = true)
 |-- registstore: string (nullable = true)
 |-- gender: string (nullable = true)
 |-- city_name_en: string (nullable = true)
 |-- register_date: string (nullable = true)
 |-- zip: string (nullable = true)

// Table: profile
root
 |-- account_id: string (nullable = false)
 |-- card_type: string (nullable = true)
 |-- card_upgrade_time_black: string (nullable = true)
 |-- card_upgrade_time_gold: string (nullable = true)

However, I have always an exception:

Exception in thread "main"
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
attributes: *, tree:
Project [*]
 Join LeftOuter, Some(('customer.account_id = 'profile.account_id))
  Subquery customer
   SparkLogicalPlan (ExistingRdd
[account_id#0,birthday#1,preferstore#2,registstore#3,gender#4,city_name_en#5,register_date#6,zip#7],
MappedRDD[5] at map at SQLFetcher.scala:43)
  Subquery profile
   SparkLogicalPlan (ExistingRdd
[account_id#8,card_type#9,card_upgrade_time_black#10,card_upgrade_time_gold#11],
MappedRDD[12] at map at SQLFetcher.scala:43)

I was not sure where the problem is. So I create two simple tables to
isolate the problem.

// table 1
a	b	c
4	8	9
1	3	4
3	4	5

// table 2
a	b	c
1	2	3
4	5	6

This time, it works.

So the problem might be in data. I have just sampled some lines of input
tables to create new ones.
This also works.

I am so confused. The problem is in the data, but the error messages are not
enough to find it (if I am not missing anything.)

Some lines of the sampled tables.

// Table: customer

[50660,1975-06-05 00:00:00.000,13,12,male,ningboshi,2006-12-14
00:00:00.000,]
[50666,1984-02-23 00:00:00.000,72,5,Female,beijingshi,2006-12-14
00:00:00.000,100086]
[50680,1976-11-25 00:00:00.000,59,5,Female,beijingshi,2006-12-14
00:00:00.000,100022]
[85,1971-03-27 00:00:00.000,2,2,Female,shanghaishi,2005-09-20
00:00:00.000,200336]


// Table: profile

[1144681,3,2010-02-18 00:00:00.000,2013-02-28 00:00:00.000]
[50666,2,2010-10-31 00:00:00.000,]
[3930657,1,,]
[1056365,2,2009-12-29 00:00:00.000,]

Any help ? =)

Hao




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-LEFT-JOIN-problem-tp16152.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message