Hi -- Notice the additional "y" in red (as Mich mentioned)

pyspark --conf queue=default --conf executory-memory=24G

How so?


Hi Clay,


Those parameters you are passing are not valid


pyspark --conf queue=default --conf executory-memory=24G


pyspark dynamic_ARRAY_generator_parquet.py


This works


$SPARK_HOME/bin/spark-submit --master local[4] dynamic_ARRAY_generator_parquet.py









Hello all,


I’m hoping someone can give me some direction for troubleshooting this issue, I’m trying to write from Spark on an HortonWorks(Cloudera) HDP cluster. I ssh directly to the first datanode and run PySpark with the following command; however, it is always failing no matter what size I set memory in Yarn Containers and Yarn Queues. Any suggestions?




pyspark --conf queue=default --conf executory-memory=24G





#HDFS_OUT="/ HDFS/Data/Test/Processed/Convert_parquet/Output"





'Test _2003.txt'


from  pyspark.sql.functions import regexp_replace,col

for f in fileList1:



                df = spark.read.option("delimiter","|").option("encoding",ENCODING).option("multiLine",True).option('wholeFile',"true").csv('{}/{}'.format(HDFS_RAW,fname), header=True)


                print('showing {}'.format(fname))

                if ('\r' in lastcol):


                                df=df.withColumn(lastcol, regexp_replace(col("{}\r".format(lastcol)), "[\r]", "")).drop('{}\r'.format(lastcol))





Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, DataNode01.mydomain.com, executor 5): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Container marked as failed: container_e331_1621375512548_0021_01_000006 on host: DataNode01.mydomain.com. Exit status: 143. Diagnostics: [2021-05-19 18:09:06.392]Container killed on request. Exit code is 143
[2021-05-19 18:09:06.413]Container exited with a non-zero exit code 143.
[2021-05-19 18:09:06.414]Killed by external signal





