drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reid Thompson <Reid.Thomp...@omnicell.com>
Subject Requesting guidance. Having trouble generating parquet files from jdbc connection to PostgreSQL. "java.lang.OutOfMemoryError: GC overhead limit exceeded"
Date Mon, 13 Aug 2018 12:03:20 GMT
My standalone host is configured with 16GB RAM, 8 cpus.  Using
drill-embedded (single host standalone), I am attempting to pull data
from PostgreSQL tables to parquet files via CTAS. Smaller datasets work
fine, but larger data sets fail (for example ~11GB) with
"java.lang.OutOfMemoryError: GC overhead limit exceeded"  Can someone
advise on how to get past this?

Is there a way to have drill stream this data from PostgreSQL to parquet
files on disk, or does the data set have to be completely loaded into
memory before it can be written to disk?  The documentation indicates
that drill will spill to disk to avoid memory issues, so I had hoped
that it would be straightforward to extract from the DB to disk.

Should I not be attempting this via CTAS?  What are the other options?


View raw message