spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prithish <prith...@gmail.com>
Subject Reading AVRO from S3 - No parallelism
Date Thu, 27 Oct 2016 12:19:29 GMT
I am trying to read a bunch of AVRO files from a S3 folder using Spark 2.0.
No matter how many executors I use or what configuration changes I make,
the cluster doesn't seem to use all the executors. I am using the
com.databricks.spark.avro library from databricks to read the AVRO.

However, if I try the same on CSV files (same S3 folder, same configuration
and cluster), it does use all executors.

Is there something that I need to do to enable parallelism when using the
AVRO databricks library?

Thanks for your help.

Mime
View raw message