spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From swetha kasireddy <swethakasire...@gmail.com>
Subject Re: Memory issues when trying to insert data in the form of ORC using Spark SQL
Date Fri, 20 May 2016 23:42:40 GMT
Also, the Spark SQL insert seems to take only two tasks per stage. That
might be the reason why it does not have sufficient memory. Is there a way
to increase the number of tasks when doing the sql insert?

Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle
ReadShuffle Write
12 (kill)
<http://10.224.161.30:4041/stages/stage/kill/?id=12&terminate=true>save at
SaveUsersToHdfs.scala:255
<http://10.224.161.30:4041/stages/stage?id=12&attempt=0>+details

2016/05/20 16:32:47 5.0 min
0/2
21.4 MB

On Fri, May 20, 2016 at 3:43 PM, SRK <swethakasireddy@gmail.com> wrote:

>
> Hi,
>
> I see some memory issues when trying to insert the data in the form of ORC
> using Spark SQL. Please find the query and exception below. Any idea as to
> why this is happening?
>
> sqlContext.sql("  CREATE EXTERNAL TABLE IF NOT EXISTS records (id STRING,
> record STRING) PARTITIONED BY (datePartition STRING, idPartition STRING)
> stored as ORC LOCATION '/user/users' ")
>       sqlContext.sql("  orc.compress= SNAPPY")
>       sqlContext.sql(
>         """ from recordsTemp ps   insert overwrite table users
> partition(datePartition , idPartition )  select ps.id, ps.record ,
> ps.datePartition, ps.idPartition  """.stripMargin)
>
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in
> stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 13.0org.apache.hadoop.hive.ql.metadata.HiveException:
> parquet.hadoop.MemoryManager$1: New Memory allocation 1048575 bytes is
> smaller than the minimum allocation size of 1048576 bytes.
>         at
>
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)
>         at
> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.org
> $apache$spark$sql$hive$SparkHiveDynamicPartitionWriterContainer$$newWriter$1(hiveWriterContainers.scala:240)
>         at
>
> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:249)
>         at
>
> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:249)
>         at
> scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
>         at
> scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
>         at
>
> org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.getLocalFileWriter(hiveWriterContainers.scala:249)
>         at
>
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:112)
>         at
>
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:104)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org
> $apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:104)
>         at
>
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
>         at
>
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: parquet.hadoop.MemoryManager$1: New Memory allocation 1048575
> bytes is smaller than the minimum allocation size of 1048576 bytes.
>         at
> parquet.hadoop.MemoryManager.updateAllocation(MemoryManager.java:125)
>         at parquet.hadoop.MemoryManager.addWriter(MemoryManager.java:82)
>         at
> parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:104)
>         at
>
> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303)
>         at
>
> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:267)
>         at
>
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.<init>(ParquetRecordWriterWrapper.java:65)
>         at
>
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getParquerRecordWriterWrapper(MapredParquetOutputFormat.java:125)
>         at
>
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:114)
>         at
>
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
>         at
>
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Memory-issues-when-trying-to-insert-data-in-the-form-of-ORC-using-Spark-SQL-tp26988.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message