Hi Mich,
   Lot of people say that Spark does not have proven record in migrating data from oracle as sqoop has.
At list in production.

Please correct me if I am wrong and suggest how to deal with shuffling when dealing with groupBy ?

Thanks,
Shyam

On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
Spark is an excellent ETL tool to lift data from source and put it in target. Spark uses JDBC connection similar to Sqoop. I don't see the need for Sqoop with Spark here.

Where is the source (Oracle MSSQL, etc) and target (Hive?) here

HTH

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 



On Thu, 29 Aug 2019 at 21:01, Chetan Khatri <chetan.opensource@gmail.com> wrote:
Hi Users,
I am launching a Sqoop job from Spark job and would like to FAIL Spark job if Sqoop job fails.

def executeSqoopOriginal(serverName: String, schemaName: String, username: String, password: String,
query: String, splitBy: String, fetchSize: Int, numMappers: Int, targetDir: String, jobName: String, dateColumns: String) = {

val connectionString = "jdbc:sqlserver://" + serverName + ";" + "databaseName=" + schemaName
var parameters = Array("import")
parameters = parameters :+ "-Dmapreduce.job.user.classpath.first=true"
parameters = parameters :+ "--connect"
parameters = parameters :+ connectionString
parameters = parameters :+ "--mapreduce-job-name"
parameters = parameters :+ jobName
parameters = parameters :+ "--username"
parameters = parameters :+ username
parameters = parameters :+ "--password"
parameters = parameters :+ password
parameters = parameters :+ "--hadoop-mapred-home"
parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop-mapreduce/"
parameters = parameters :+ "--hadoop-home"
parameters = parameters :+ "/usr/hdp/2.6.5.0-292/hadoop/"
parameters = parameters :+ "--query"
parameters = parameters :+ query
parameters = parameters :+ "--split-by"
parameters = parameters :+ splitBy
parameters = parameters :+ "--fetch-size"
parameters = parameters :+ fetchSize.toString
parameters = parameters :+ "--num-mappers"
parameters = parameters :+ numMappers.toString
if (dateColumns.length() > 0) {
parameters = parameters :+ "--map-column-java"
parameters = parameters :+ dateColumns
}
parameters = parameters :+ "--target-dir"
parameters = parameters :+ targetDir
parameters = parameters :+ "--delete-target-dir"
parameters = parameters :+ "--as-avrodatafile"

}