drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasad Nagaraj Subramanya <prasadn...@gmail.com>
Subject Re: Drill session and jdbc connections
Date Thu, 14 Dec 2017 10:13:07 GMT
The code looks good. To keep things simple you can call *reset
`option_name`* for all options which you change during the session. (that
way you need not be concerned with the default value)

So, your finally block will look like -
    finally{
        st.executeQuery("RESET `store.format`");
    }

On Thu, Dec 14, 2017 at 12:25 AM, Rahul Raj <rahul.raj@option3consulting.com
> wrote:

> Let me know if the code is fine.
>
> Regards,
> Rahul
>
> On Thu, Dec 14, 2017 at 1:53 PM, Rahul Raj <rahul.raj@option3consulting.
> com>
> wrote:
>
> > Sorry, I confused. I am using a pool of connections, code snippet below:
> >
> > try(Connection conn = pool.getConnection()){
> >     try(Statement st = conn.createStatement()){
> >     try{
> >         st.executeQuery("ALTER SESSION SET `store.format`='csv'");
> >         ResultSet rs = st.executeQuery("CTAS HERE ...");
> >         while(rs.next()){
> >
> >            logger.debug("Created file {} with {} of records",
> > rs.getString(1), rs.getLong(2));
> >         }
> >     finally{
> >         st.executeQuery("ALTER SESSION SET `store.format`='parquet'");
> >     }
> >    }
> > } catch(SQLException e) {
> >    // throw new ...
> > }
> >
> > Something that I observed is that it is required to read the number of
> > results back as shown in the logging query, or else the parquet files
> will
> > be written partial(You will find parquets with 0 bytes) when data volume
> is
> > huge.
> >
> >
> > On Thu, Dec 14, 2017 at 12:39 PM, Kunal Khatua <kkhatua@mapr.com> wrote:
> >
> >> That will (IMO) not solve the problem, since different threads will be
> >> setting and resetting the store format. My suggestion would be to use a
> >> pool of connections and each thread work off one connection, and
> returning
> >> it to the pool when done resetting.
> >>
> >> -----Original Message-----
> >> From: Rahul Raj [mailto:rahul.raj@option3consulting.com]
> >> Sent: Wednesday, December 13, 2017 10:23 PM
> >> To: user@drill.apache.org
> >> Subject: Re: Drill session and jdbc connections
> >>
> >> We are using a one connection and multiple statements for creating the
> >> CSV files. I will surround the calls with a finally to reset the store
> >> format.
> >>
> >> Thanks for your inputs,
> >>
> >> Regards,
> >> Rahul
> >>
> >> On Wed, Dec 13, 2017 at 10:59 PM, Kunal Khatua <kkhatua@mapr.com>
> wrote:
> >>
> >> > A Drill session is isolated and bound to a connection. Your
> >> > 'getConnection()' method might be fetching connections from a pool,
> >> > where the settings haven't been reset. If the connections are shared,
> >> > you will continue to have this problem.
> >> >
> >> > If you are returning a connection back to the pool, run the RESET
> >> > command to ensure the default state is set.
> >> >
> >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__drill.
> apache.org_
> >> > docs_reset_&d=DwIBaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=-
> cT6otg6lpT_XkmYy7yg3A
> >> > &m=ZvNYB-8v46AopTz4m3cYOACJ-OZevKaxg0jBJBBq2MQ&s=-
> DeteEKKw3vUytwB7ZEVn
> >> > yDMb5mIeceotKszFZsjSjE&e=
> >> >
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: Rahul Raj [mailto:rahul.raj@option3consulting.com]
> >> > Sent: Wednesday, December 13, 2017 2:17 AM
> >> > To: user@drill.apache.org
> >> > Subject: Drill session and jdbc connections
> >> >
> >> > Hi,
> >> >
> >> > How is a drill session related to a drill jdbc connection instance?
> >> > What happens in a pool of connections when one connection changes the
> >> > store.format? I am seeing some mix-ups where a parquet row is written
> >> > as an array of multiple records(rather than multiple columns) when
> >> > another thread tries to create a csv file. This happens only during
> >> > the race condition between CSV and parquet formats.
> >> >
> >> > Scenario:
> >> >
> >> > Thread 1 for CSV creation:
> >> >
> >> > Connection conn = getConnection();
> >> > conn.execute("ALTER SESSION SET `store.format`='csv'")
> >> > conn.execute("CREATE TABLE someparquet AS ...") conn.execute("ALTER
> >> > SESSION SET `store.format`='parquet'")
> >> >
> >> > Thread 2 for parquet creation:
> >> >
> >> > Connection conn = getConnection();
> >> > conn.execute("CREATE TABLE somecsv AS ...")
> >> >
> >> > In thread 2, the parquet gets written as an ARRAY with all the fields
> >> > because of the side effect of Thread 1 setting format as CSV when they
> >> > execute in parallel.
> >> >
> >> > Is it possible to have session isolation in this situation?
> >> >
> >> > Regards,
> >> > Rahul
> >> >
> >> > --
> >> > **** This email and any files transmitted with it are confidential and
> >> > intended solely for the use of the individual or entity to whom it is
> >> > addressed. If you are not the named addressee then you should not
> >> > disseminate, distribute or copy this e-mail. Please notify the sender
> >> > immediately and delete this e-mail from your system.****
> >> >
> >>
> >> --
> >> **** This email and any files transmitted with it are confidential and
> >> intended solely for the use of the individual or entity to whom it is
> >> addressed. If you are not the named addressee then you should not
> >> disseminate, distribute or copy this e-mail. Please notify the sender
> >> immediately and delete this e-mail from your system.****
> >>
> >
> >
>
> --
> **** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.****
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message