parquet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qinghui Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PARQUET-282) OutOfMemoryError in job commit / ParquetMetadataConverter
Date Fri, 08 Mar 2019 21:03:00 GMT

    [ https://issues.apache.org/jira/browse/PARQUET-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788297#comment-16788297
] 

Qinghui Xu commented on PARQUET-282:
------------------------------------

This looks like not a problem from parquet-mr itself, let's close it?

> OutOfMemoryError in job commit / ParquetMetadataConverter
> ---------------------------------------------------------
>
>                 Key: PARQUET-282
>                 URL: https://issues.apache.org/jira/browse/PARQUET-282
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.6.0
>         Environment: CentOS, MapR,. Scalding
>            Reporter: hy5446
>            Priority: Critical
>
> We're trying to write some 14B rows (about 3.6 TB in parquets) to parquet files. When
our ETL job finishes, it throws this exception, and the status is "died in job commit".
> 2015-05-14 09:24:28,158 FATAL [CommitterEvent Processor #4] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler:
Thread Thread[CommitterEvent Processor #4,5,main] threw an Error.  Shutting down now...
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 	at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
> 	at java.nio.ByteBuffer.wrap(ByteBuffer.java:396)
> 	at parquet.format.Statistics.setMin(Statistics.java:237)
> 	at parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:243)
> 	at parquet.format.converter.ParquetMetadataConverter.addRowGroup(ParquetMetadataConverter.java:167)
> 	at parquet.format.converter.ParquetMetadataConverter.toParquetMetadata(ParquetMetadataConverter.java:79)
> 	at parquet.hadoop.ParquetFileWriter.serializeFooter(ParquetFileWriter.java:405)
> 	at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:433)
> 	at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:423)
> 	at parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
> 	at parquet.hadoop.mapred.MapredParquetOutputCommitter.commitJob(MapredParquetOutputCommitter.java:43)
> 	at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:259)
> 	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:253)
> 	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:216)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> This seems to have something to do with the _metadata file creation, as the parquet files
are perfectly fine and usable. Also I'm not sure how to alleviate this (i.e. add more heap
space) since the crash is outside the Map/Reduce tasks themselves but seems in the job/application
controller itself. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message