spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Schwanitz <bil...@bilsch.org>
Subject Re: Run jobs in parallel in standalone mode
Date Tue, 16 Jan 2018 12:39:15 GMT
https://docs.databricks.com/spark/latest/data-sources/sql-databases.html#jdbc-reads

I had the same issue with a different db but its down in the jdbc and task
management. You need to specify a column with upper and lower bounds. Also
need to specify how many threads to use ( 1 thread per worker ).

On Tue, Jan 16, 2018 at 3:00 AM, Onur EKİNCİ <oekinci@innova.com.tr> wrote:

> Hi,
>
>
>
> We are trying to get data from an Oracle database into Kinetica database
> through Apache Spark.
>
>
>
> We installed Spark in standalone mode. We executed the following commands.
> However, we have tried everything but we couldnt manage to run jobs in
> parallel. We use 2 IBM servers each of which has 128cores and 1TB memory.
>
>
>
> We also added  in the spark-defaults.conf  :
>
> spark.executor.memory=64g
>
> spark.executor.cores=32
>
> spark.default.parallelism=32
>
> spark.cores.max=64
>
> spark.scheduler.mode=FAIR
>
> spark.sql.shuffle.partions=32
>
>
>
>
>
> *On the machine: 10.20.10.228*
>
> ./start-master.sh --webui-port 8585
>
>
>
> ./start-slave.sh --webui-port 8586 spark://10.20.10.228:7077
>
>
>
>
>
> *On the machine 10.20.10.229 <http://10.20.10.229>:*
>
> ./start-slave.sh --webui-port 8586 spark://10.20.10.228:7077
>
>
>
>
>
> *On the machine: 10.20.10.228 <http://10.20.10.228>:*
>
>
>
> We start the Spark shell:
>
>
>
> spark-shell --master spark://10.20.10.228:7077
>
>
>
> Then we make configurations:
>
>
>
> val df  = spark.read.format("jdbc").option("url", "jdbc:sqlserver://
> 10.20.10.148:1433;databaseName=testdb").option("dbtable",
> "dbo.temp_muh_hareket").option("user", "gpudb").option("password",
> "Kinetica2017!").load()
>
> import com.kinetica.spark._
>
> val lp = new LoaderParams("http://10.20.10.228:9191", "jdbc:simba://
> 10.20.10.228:9292;ParentSet=MASTER", "muh_hareket_20",
> false,"",100000,true,true,"admin","Kinetica2017!",4, true, true, 1)
>
> SparkKineticaLoader.KineticaWriter(df,lp);
>
>
>
>
>
> The above commands successfully work. The data transfer completes. However,
> jobs work serially not in parallel. Also executors work serially and take
> turns. They donw work in parallel.
>
>
>
> How can we make jobs work in parallel?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> I really appreciate your help. We have done everything that we could.
>
>
>
> Onur EKİNCİ
> Bilgi Yönetimi Yöneticisi
> Knowledge Management Manager
>
> m:+90 553 044 2341 <+90%20553%20044%2023%2041>  d:+90 212 329 7000
> <(212)%20329-7000>
>
> İTÜ Ayazağa Kampüsü, Teknokent ARI4 Binası 34469 Maslak İstanbul - Google
> Maps <http://www.innova.com.tr/istanbul.asp>
>
> <http://www.innova.com.tr/> <http://www.innova.com.tr/>
> <http://www.innova.com.tr/> <http://www.innova.com.tr/>
> <http://www.innova.com.tr>
>
>
>
>
> Yasal Uyarı :
> Bu elektronik posta işbu linki kullanarak ulaşabileceğiniz Koşul ve
> Şartlar dokümanına tabidir :
> http://www.innova.com.tr/disclaimer-yasal-uyari.asp
>

Mime
View raw message