spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From satish chandra j <jsatishchan...@gmail.com>
Subject JdbcRDD Constructor
Date Wed, 23 Sep 2015 05:30:34 GMT
HI All,

JdbcRDD constructor has following parameters,

*JdbcRDD
<https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/rdd/JdbcRDD.html#JdbcRDD(org.apache.spark.SparkContext,
scala.Function0, java.lang.String, long, long, int, scala.Function1,
scala.reflect.ClassTag)>*(SparkContext
<https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/SparkContext.html>
sc,
scala.Function0<java.sql.Connection> getConnection, String sql, *long
lowerBound,
long upperBound, int numPartitions*, scala.Function1<java.sql.ResultSet,T
<https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/rdd/JdbcRDD.html>>
mapRow,
scala.reflect.ClassTag<T
<https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/rdd/JdbcRDD.html>
> evidence$1)

where the below parameters *lowerBound* refers to Lower boundary of
entire data, *upperBound *refers to Upper boundary of entire data and
*numPartitions
*refer to Number of partitions

Source table to which JbdcRDD is fetching data from Oracle DB has more than
500 records but its confusing when I tried several executions by changing
"numPartitions" parameter

LowerBound,UpperBound,numPartitions: Output Count

0                 ,100              ,1                   : 100

0                 ,100              ,2                   : 151

0                 ,100              ,3                   : 201


Please help me in understanding the why Output count is 151 if
numPartitions is 2 and Output count is 201 if numPartitions is 3

Regards,

Satish Chandra

Mime
View raw message