drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Raj <rahul....@option3consulting.com>
Subject Re: Slow query on parquet imported from SQL Server while the external SQL server is down.
Date Thu, 01 Dec 2016 05:45:11 GMT
Abhishek,

Your observation is correct, we just verified that:

   1. The queries run as expected(faster) with Jdbc plugin disabled.
   2. Queries run as expected when the plugin's datasource is running.
   3. With the datasource down, queries run very slow waiting for the
   connection to fail

Rahul

On Thu, Dec 1, 2016 at 10:07 AM, Abhishek Girish <abhishek.girish@gmail.com>
wrote:

> @John,
>
> I agree that this should work. While I am not certain, I don't think the
> issue is specific to a particular plugin, but the way in a query's
> lifecycle, the foreman attempts to initialize every enabled storage plugin
> before proceeding to execute the query. So when a particular plugin isn't
> configured correctly or the underlying datasource is not up, this could
> drastically slow down the query execution time.
>
> I'll look up to see if we have a JIRA for this already - if not will file
> one.
>
> On Wed, Nov 30, 2016 at 8:12 AM, John Omernik <john@omernik.com> wrote:
>
> > So just my opinion in reading this thread.  (sorry for swooping in an
> > opining)
> >
> > If a CTAS is done from any data source into Parquet files.... there
> should
> > be NO dependency on the original data source to query the resultant
> Parquet
> > files.   As a Drill user, as a Drill admin, this breaks the concept of
> > least surprise.  If I take data from one source, and create Parquet files
> > in a distributed file system, it should just work.  If there are "issues"
> > with JDBC plugins or the HBase/Hive plugins in a similar manner, these
> > needs to be hunted down by a large group of villages with pitchforks and
> > torches.  I just can't see how this could be acceptable at any level. The
> > whole idea of Parquet files is they are self describing, schema included
> > files.... thus a read of a directory of Parquet files should have NO
> > dependancies on anything but the parquet files... even the Parquet
> > "additions" (such as the METADATA Cache) should be a fail open thing...
> if
> > it exists great, use it, speed things up, but if it doesn't read the
> > parquet files as normal (Which I believe is how it operates)
> >
> > John
> >
> > On Wed, Nov 30, 2016 at 12:12 AM, Abhishek Girish <
> > abhishek.girish@gmail.com
> > > wrote:
> >
> > > Can you attempt to disable to jdbc plugin (configured with SQLServer)
> and
> > > try the query (on parquet) when SQL Server is offline?
> > >
> > > I've seen a similar issue previously when the HBase / Hive plugin was
> > > enabled but either the plugin configuration was wrong or the underlying
> > > data source was down.
> > >
> > > On Fri, Nov 25, 2016 at 3:21 AM, Rahul Raj
> <rahul.raj@option3consulting.
> > > com>
> > > wrote:
> > >
> > > > I have created a parquet file using CTAS from a MS SQL Server. The
> > query
> > > on
> > > > parquet is getting stuck in STARTING state for a long time before
> > > returning
> > > > the results.
> > > >
> > > > We could see that drill was trying to connect to the MS SQL server
> from
> > > > which the data was imported. The MSSQL server was down, drill threw
> an
> > > > exception "Failure while attempting to load JDBC schema", and then
> > > returned
> > > > the results. While SQL server is running, the query executes without
> > > > issues.
> > > >
> > > > Why is drill querying the DB metadata externally and not the imported
> > > > parquets?
> > > >
> > > > Rahul.
> > > >
> > > > --
> > > > **** This email and any files transmitted with it are confidential
> and
> > > > intended solely for the use of the individual or entity to whom it is
> > > > addressed. If you are not the named addressee then you should not
> > > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > > immediately and delete this e-mail from your system.****
> > > >
> > >
> >
>

-- 
**** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.****

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message