spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Conconscious <conconsci...@gmail.com>
Subject Running Spark 2.2.1 with extra packages
Date Fri, 02 Feb 2018 19:43:35 GMT
Hi list,

I have a Spark cluster with 3 nodes. I'm calling spark-shell with some
packages to connect to AWS S3 and Cassandra:

spark-shell \
  --packages
org.apache.hadoop:hadoop-aws:2.7.3,com.amazonaws:aws-java-sdk:1.7.4,datastax:spark-cassandra-connector:2.0.6-s_2.11
\
  --conf spark.cassandra.connection.host=10.100.120.100,10.100.120.101 \
  --conf spark.cassandra.auth.username=cassandra \
  --conf spark.cassandra.auth.password=cassandra \
  --master spark://10.100.120.104:7077

Then running this test app:

sc.stop
import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions.from_json
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._

import java.sql.Timestamp
import java.io.File

import org.apache.commons.io.IOUtils
import java.net.URL
import java.nio.charset.Charset
import org.apache.hadoop.fs

System.setProperty("com.amazonaws.services.s3.enableV4", "true")

val region = "eu-central-1"

val conf = new SparkConf(true).setMaster("local[*]").setAppName("S3
connect")

val sc = new SparkContext(conf)
   sc.setLocalProperty("spark.default.parallelism", "30")
   sc.hadoopConfiguration.set("fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem")
   sc.hadoopConfiguration.set("com.amazonaws.services.s3.enableV4", "true")
   sc.hadoopConfiguration.set("fs.s3a.endpoint", "s3." + region +
".amazonaws.com")

val sqlContext = new SQLContext(sc)
val s3r = sqlContext.read.json("s3a://mybucket/folder/file.json")
s3r.take(1)

With .setMaster("local[*]") the application runs nice, but removing the
setmaster and let the entire cluster work I'm getting:

WARN TaskSchedulerImpl: Initial job has not accepted any resources;
check your cluster UI to ensure that workers are registered and have
sufficient resources

How can I make my extra packages available to the entire cluster?

Thanks in advance



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message