drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Raj <rahul....@option3consulting.com>
Subject Re: Drill session and jdbc connections
Date Thu, 14 Dec 2017 08:23:43 GMT
Sorry, I confused. I am using a pool of connections, code snippet below:

try(Connection conn = pool.getConnection()){
    try(Statement st = conn.createStatement()){
    try{
        st.executeQuery("ALTER SESSION SET `store.format`='csv'");
        ResultSet rs = st.executeQuery("CTAS HERE ...");
        while(rs.next()){

           logger.debug("Created file {} with {} of records",
rs.getString(1), rs.getLong(2));
        }
    finally{
        st.executeQuery("ALTER SESSION SET `store.format`='parquet'");
    }
   }
} catch(SQLException e) {
   // throw new ...
}

Something that I observed is that it is required to read the number of
results back as shown in the logging query, or else the parquet files will
be written partial(You will find parquets with 0 bytes) when data volume is
huge.


On Thu, Dec 14, 2017 at 12:39 PM, Kunal Khatua <kkhatua@mapr.com> wrote:

> That will (IMO) not solve the problem, since different threads will be
> setting and resetting the store format. My suggestion would be to use a
> pool of connections and each thread work off one connection, and returning
> it to the pool when done resetting.
>
> -----Original Message-----
> From: Rahul Raj [mailto:rahul.raj@option3consulting.com]
> Sent: Wednesday, December 13, 2017 10:23 PM
> To: user@drill.apache.org
> Subject: Re: Drill session and jdbc connections
>
> We are using a one connection and multiple statements for creating the CSV
> files. I will surround the calls with a finally to reset the store format.
>
> Thanks for your inputs,
>
> Regards,
> Rahul
>
> On Wed, Dec 13, 2017 at 10:59 PM, Kunal Khatua <kkhatua@mapr.com> wrote:
>
> > A Drill session is isolated and bound to a connection. Your
> > 'getConnection()' method might be fetching connections from a pool,
> > where the settings haven't been reset. If the connections are shared,
> > you will continue to have this problem.
> >
> > If you are returning a connection back to the pool, run the RESET
> > command to ensure the default state is set.
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.apache.org_
> > docs_reset_&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-cT6otg6lpT_XkmYy7yg3A
> > &m=ZvNYB-8v46AopTz4m3cYOACJ-OZevKaxg0jBJBBq2MQ&s=-DeteEKKw3vUytwB7ZEVn
> > yDMb5mIeceotKszFZsjSjE&e=
> >
> >
> >
> > -----Original Message-----
> > From: Rahul Raj [mailto:rahul.raj@option3consulting.com]
> > Sent: Wednesday, December 13, 2017 2:17 AM
> > To: user@drill.apache.org
> > Subject: Drill session and jdbc connections
> >
> > Hi,
> >
> > How is a drill session related to a drill jdbc connection instance?
> > What happens in a pool of connections when one connection changes the
> > store.format? I am seeing some mix-ups where a parquet row is written
> > as an array of multiple records(rather than multiple columns) when
> > another thread tries to create a csv file. This happens only during
> > the race condition between CSV and parquet formats.
> >
> > Scenario:
> >
> > Thread 1 for CSV creation:
> >
> > Connection conn = getConnection();
> > conn.execute("ALTER SESSION SET `store.format`='csv'")
> > conn.execute("CREATE TABLE someparquet AS ...") conn.execute("ALTER
> > SESSION SET `store.format`='parquet'")
> >
> > Thread 2 for parquet creation:
> >
> > Connection conn = getConnection();
> > conn.execute("CREATE TABLE somecsv AS ...")
> >
> > In thread 2, the parquet gets written as an ARRAY with all the fields
> > because of the side effect of Thread 1 setting format as CSV when they
> > execute in parallel.
> >
> > Is it possible to have session isolation in this situation?
> >
> > Regards,
> > Rahul
> >
> > --
> > **** This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom it is
> > addressed. If you are not the named addressee then you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately and delete this e-mail from your system.****
> >
>
> --
> **** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.****
>

-- 
**** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.****

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message