spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ARAVIND SETHURATHNAM <asethurath...@homeaway.com.INVALID>
Subject Re: Spark batch job: failed to compile: java.lang.NullPointerException
Date Mon, 18 Jun 2018 22:56:18 GMT
Spark version is 2.2 and I think I am running into this issue https://issues.apache.org/jira/browse/SPARK-18016
   as the dataset schema is pretty huge and nested

From: ARAVIND SETHURATHNAM <asethurathnam@homeaway.com.INVALID>
Date: Monday, June 18, 2018 at 4:00 PM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: Spark batch job: failed to compile: java.lang.NullPointerException


Hi,
We have a  spark job that reads AVRO data from a S3 location , does some processing and writes
it back to S3. Of late it has been failing with the exception below,


Application application_1529346471665_0020 failed 1 times due to AM Container for appattempt_1529346471665_0020_000001
exited with exitCode: -104
For more detailed output, check application tracking page:http://10.122.49.134:8088/proxy/application_1529346471665_0020/Then,
click on links to logs of each attempt.
Diagnostics: Container [pid=14249,containerID=container_1529346471665_0020_01_000001] is running
beyond physical memory limits. Current usage: 23.4 GB of 22 GB physical memory used; 28.7
GB of 46.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1529346471665_0020_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES)
RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 14255 14249 14249 14249 (java) 23834 8203 30684336128 6142485 /usr/java/default/bin/java
-server -Xmx20480m -Djava.io.tmpdir=/media/ephemeral0/yarn/local/usercache/asethurathnam/appcache/application_1529346471665_0020/container_1529346471665_0020_01_000001/tmp
-Dspring.profiles.active=stage -Dspark.yarn.app.container.log.dir=/media/ephemeral1/logs/yarn/application_1529346471665_0020/container_1529346471665_0020_01_000001
-XX:MaxPermSize=512m org.apache.spark.deploy.yarn.ApplicationMaster --class com.homeaway.omnihub.OmnitaskApp
--jar /tmp/spark-9f42e005-e1b4-47c2-a6e8-ac0bc9fa595b/omnitask-spark-compaction-0.0.1.jar
--arg --PATH=s3://ha-stage-datalake-landing-zone-us-east-1/avro-hourly/entityEventLodgingRate-2/
--arg --OUTPUT_PATH=s3://ha-stage-datalake-landing-zone-us-east-1/avro-daily/entityEventLodgingRate-2/
--arg --DB_NAME=tier1_landingzone --arg --TABLE_NAME=entityeventlodgingrate_2_daily --arg
--TABLE_DESCRIPTION=data in: 's3://ha-stage-datalake-landing-zone-us-east-1/avro-daily/entityEventLodgingRate-2'
--arg --FORMAT=AVRO --arg --PARTITION_COLUMNS=dateid --arg --HOURLY=false --arg --START_DATE=20180616
--arg --END_DATE=20180616 --properties-file /media/ephemeral0/yarn/local/usercache/asethurathnam/appcache/application_1529346471665_0020/container_1529346471665_0020_01_000001/__spark_conf__/__spark_conf__.properties
|- 14249 14247 14249 14249 (bash) 0 1 115826688 704 /bin/bash -c LD_LIBRARY_PATH=/usr/lib/hadoop2/lib/native::/usr/lib/qubole/packages/hadoop2-2.6.0/hadoop2/lib/native:/usr/lib/qubole/packages/hadoop2-2.6.0/hadoop2/lib/native
/usr/java/default/bin/java -server -Xmx20480m -Djava.io.tmpdir=/media/ephemeral0/yarn/local/usercache/asethurathnam/appcache/application_1529346471665_0020/container_1529346471665_0020_01_000001/tmp
'-Dspring.profiles.active=stage' -Dspark.yarn.app.container.log.dir=/media/ephemeral1/logs/yarn/application_1529346471665_0020/container_1529346471665_0020_01_000001
-XX:MaxPermSize=512m org.apache.spark.deploy.yarn.ApplicationMaster --class 'com.homeaway.omnihub.OmnitaskApp'
--jar /tmp/spark-9f42e005-e1b4-47c2-a6e8-ac0bc9fa595b/omnitask-spark-compaction-0.0.1.jar
--arg '--PATH=s3://ha-stage-datalake-landing-zone-us-east-1/avro-hourly/entityEventLodgingRate-2/'
--arg '--OUTPUT_PATH=s3://ha-stage-datalake-landing-zone-us-east-1/avro-daily/entityEventLodgingRate-2/'
--arg '--DB_NAME=tier1_landingzone' --arg '--TABLE_NAME=entityeventlodgingrate_2_daily' --arg
'--TABLE_DESCRIPTION=data in: '\''s3://ha-stage-datalake-landing-zone-us-east-1/avro-daily/entityEventLodgingRate-2'\'''
--arg '--FORMAT=AVRO' --arg '--PARTITION_COLUMNS=dateid' --arg '--HOURLY=false' --arg '--START_DATE=20180616'
--arg '--END_DATE=20180616' --properties-file /media/ephemeral0/yarn/local/usercache/asethurathnam/appcache/application_1529346471665_0020/container_1529346471665_0020_01_000001/__spark_conf__/__spark_conf__.properties
1> /media/ephemeral1/logs/yarn/application_1529346471665_0020/container_1529346471665_0020_01_000001/stdout
2> /media/ephemeral1/logs/yarn/application_1529346471665_0020/container_1529346471665_0020_01_000001/stderr
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.





and in one of the executor logs which has a failed task I see below, can someone please lmk
  whats causing the exception and the tasks to fail and what that below class is ?


18/06/18 20:34:05 dispatcher-event-loop-6 INFO BlockManagerInfo: Added broadcast_0_piece0
in memory on 10.122.51.238:42797 (size: 30.7 KB, free: 10.5 GB)
18/06/18 20:34:06 dispatcher-event-loop-0 INFO BlockManagerInfo: Added broadcast_0_piece0
in memory on 10.122.51.238:43173 (size: 30.7 KB, free: 10.5 GB)
18/06/18 20:34:41 task-result-getter-0 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID
1, 10.122.48.122, executor 2): org.apache.spark.SparkException: Task failed while writing
rows
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:204)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:129)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$3.apply(FileFormatWriter.scala:128)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:99)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Error while encoding: java.util.concurrent.ExecutionException:
java.lang.Exception: failed to compile: java.lang.NullPointerException
/* 001 */ public java.lang.Object generate(Object[] references) {
/* 002 */   return new SpecificUnsafeProjection(references);
/* 003 */ }
/* 004 */
/* 005 */ class SpecificUnsafeProjection extends org.apache.spark.sql.catalyst.expressions.UnsafeProjection
{
/* 006 */
/* 007 */   private Object[] references;
/* 008 */   private int argValue;
/* 009 */   private Object[] values;
/* 010 */   private int argValue1;
/* 011 */   private boolean isNull21;
/* 012 */   private boolean value21;
/* 013 */   private boolean isNull22;
/* 014 */   private long value22;
/* 015 */   private boolean isNull23;
/* 016 */   private long value23;
/* 017 */   private int argValue2;
/* 018 */   private java.lang.String argValue3;
/* 019 */   private boolean isNull39;
/* 020 */   private boolean value39;
/* 021 */   private boolean isNull40;
/* 022 */   private UTF8String value40;
/* 023 */   private boolean isNull41;
/* 024 */   private UTF8String value41;
/* 025 */   private int argValue4;
/* 026 */   private java.lang.String argValue5;
/* 027 */   private boolean isNull57;
/* 028 */   private boolean value57;
/* 029 */   private boolean isNull58;


Regards
aravind

Mime
View raw message