spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Somogyi <gabor.g.somo...@gmail.com>
Subject Re: Structured Streaming Spark 3.0.1
Date Fri, 22 Jan 2021 06:21:25 GMT
The most interesting part is that you've added this:
kafka-clients-0.10.2.2.jar
Spark 3.0.1 uses Kafka clients 2.4.1. Downgrading with such a big step
doesn't help. Please remove that also togrther w/ Spark-Kafka dependency.

G


On Thu, 21 Jan 2021, 22:45 gshen, <gshen92@gmail.com> wrote:

> Thanks for the tips!
>
> I think I figured out what might be causing it. It's the checkpointing to
> Microsoft Azure Data Lake Storage (ADLS).
>
> When I use "local checkpointing" it works, but when i use fails when
> there's
> a groupBy in the stream. Weirdly it works when there is no groupBy clause
> in
> the stream.
>
> It's able to create the checkpoint location and the base files on ADLS, but
> it can't write any commits. Hence, why I see the first batch of records,
> but
> then it crashes on writing the first commit.
>
> Here's what my checkpointing location looks like:
>
> abfss://"+container_name+"@"+storage_account_name+".dfs.core.windows.net/
> "+file_name
>
> and my SparkSession:
>
> spark = pyspark.sql.SparkSession.builder\
>     .appName("pyspark-stream-read")\
>     .master("spark://spark-master:7077")\
>     .config("spark.executor.memory", "512m")\
>     .config("spark.jars.packages",
>
> "io.delta:delta-core_2.12:0.7.0,org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1,org.apache.hadoop:hadoop-azure:3.2.1")
> \
>     .config("spark.sql.extensions",
> "io.delta.sql.DeltaSparkSessionExtension") \
>     .config("spark.sql.catalog.spark_catalog",
> "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
>     .config("spark.delta.logStore.class",
> "org.apache.spark.sql.delta.storage.AzureLogStore") \
>     .getOrCreate()
>
>
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message