spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jain, Nishit" <>
Subject Re: How do I convert a data frame to broadcast variable?
Date Thu, 03 Nov 2016 16:32:48 GMT
Thanks Denny! That does help. I will give that a shot.

Question: If I am going this route, I am wondering how can I only read few columns of a table
(not whole table) from JDBC as data frame.
This function from data frame reader does not give an option to read only certain columns:
def jdbc(url: String, table: String, predicates: Array[String], connectionProperties: Properties):

On the other hand if I want to create a JDBCRdd I can specify a select query (instead of full
new JdbcRDD(sc: SparkContext, getConnection: () ⇒ Connection, sql: String, lowerBound: Long,
upperBound: Long, numPartitions: Int, mapRow: (ResultSet) ⇒ T = JdbcRDD.resultSetToObjectArray)(implicit
arg0: ClassTag[T])

May be if I do, col2)  on data frame created via a table, will spark be smart
enough to fetch only two columns not entire table?
Any way to test this?

From: Denny Lee <<>>
Date: Thursday, November 3, 2016 at 10:59 AM
To: "Jain, Nishit" <<>>, "<>"
Subject: Re: How do I convert a data frame to broadcast variable?

If you're able to read the data in as a DataFrame, perhaps you can use a BroadcastHashJoin
so that way you can join to that table presuming its small enough to distributed?  Here's
a handy guide on a BroadcastHashJoin:,%20DataFrames%20%26%20Datasets/05%20BroadcastHashJoin%20-%20scala.html


On Thu, Nov 3, 2016 at 8:53 AM Jain, Nishit <<>>
I have a lookup table in HANA database. I want to create a spark broadcast variable for it.
What would be the suggested approach? Should I read it as an data frame and convert data frame
into broadcast variable?

View raw message