drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reid Thompson <Reid.Thomp...@omnicell.com>
Subject Re: Requesting guidance. Having trouble generating parquet files from jdbc connection to PostgreSQL. "java.lang.OutOfMemoryError: GC overhead limit exceeded"
Date Mon, 13 Aug 2018 18:18:41 GMT
Vitalii,

Ok, thanks, I had found that report, but didn't note the option related
to defaultAutoCommit. 
> [1] https://issues.apache.org/jira/browse/DRILL-4177


so, something along the lines of

..snip..
  "url": "jdbc:postgresql://myhost.mydomain.com/ateb?useCursorFetch=true&defaultFetchSize=10000&defaultAutoCommit=false",
..snip..


thanks,
reid

On Mon, 2018-08-13 at 20:33 +0300, Vitalii Diravka wrote:
> [EXTERNAL SOURCE]
>  
> Hi Reid,
> 
> Look like your issue is similar to DRILL-4177 [1].
> It was related to MySQL connection. Looks like the similar issue is with PostgreSQL.
> Looking at the Postgres documentation, the code needs to explicitly set the connection
autocommit mode 
> to false e.g. conn.setAutoCommit(false) [2]. For data size of 10 million plus, this is
a must.
> 
> You could disable "Auto Commit" option as session option [3] 
> or to do it within plugin config URL with the following property: defaultAutoCommit=false
[4]
> 
> [1] https://issues.apache.org/jira/browse/DRILL-4177
> [2] https://jdbc.postgresql.org/documentation/93/query.html#fetchsize-example
> [3] https://www.postgresql.org/docs/9.3/static/ecpg-sql-set-autocommit.html
> [4] https://jdbc.postgresql.org/documentation/head/ds-cpds.html
> 
> Kind regards
> Vitalii
> 
> 
> On Mon, Aug 13, 2018 at 3:03 PM Reid Thompson <Reid.Thompson@omnicell.com> wrote:
> > My standalone host is configured with 16GB RAM, 8 cpus.  Using
> > drill-embedded (single host standalone), I am attempting to pull data
> > from PostgreSQL tables to parquet files via CTAS. Smaller datasets work
> > fine, but larger data sets fail (for example ~11GB) with
> > "java.lang.OutOfMemoryError: GC overhead limit exceeded"  Can someone
> > advise on how to get past this?
> > 
> > Is there a way to have drill stream this data from PostgreSQL to parquet
> > files on disk, or does the data set have to be completely loaded into
> > memory before it can be written to disk?  The documentation indicates
> > that drill will spill to disk to avoid memory issues, so I had hoped
> > that it would be straightforward to extract from the DB to disk.
> > 
> > Should I not be attempting this via CTAS?  What are the other options?
> > 
> > 
> > thanks,
> > reid
> > 
> > 
> > 
> > 

Mime
View raw message