spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tang Jinxin <xiaoxingst...@gmail.com>
Subject 回复:[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available
Date Wed, 22 Apr 2020 15:16:35 GMT
Maybe datanode stop data transfer due    to timeout.Could you please provide exception stack?
xiaoxingstack 邮箱:xiaoxingstack@gmail.com 签名由 网易邮箱大师 定制 在2020年04月22日
19:53,maqy 写道:     Today I meet the same problem using rdd.collect (), the format
of rdd is Tuple2 [Int, Int]. And this problem will appear when the amount of data reaches
about 100GB.     I guess there may be something wrong with deserialization. Has anyone
else encountered this problem?   Best regards, maqy   发件人: maqy1995@outlook.com 发送时间:
2020年4月20日 10:33 收件人: user@spark.apache.org 主题: [Spark SQL] [Beginner] Dataset[Row]
collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available  
Hi all, I get a Dataset[Row] through the following code:   val df: Dataset[Row] = spark.read.format("csv).schema(schema).load("hdfs://master:9000/mydata")
  After that I want to collect it to the driver:   val df_rows: Array[Row] = df.collect()
  The Spark web ui shows that all tasks have run successfully, but the application did not
stop. After more than ten minutes, an error will be generated in the shell:   java.io.EOFException:
Premature EOF: no length prefix available   Environment:     Spark 2.4.3     Hadoop
2.7.7     Total rows of data about 800,000,000, 12GB         More detailed information
can be seen here: https://stackoverflow.com/questions/61202566/spark-sql-datasetrow-collect-to-driver-throw-java-io-eofexception-premature-e
    Does anyone know the reason?   Best regards, maqy    
Mime
View raw message