drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Girish <abhishek.gir...@gmail.com>
Subject Re: Slow query on parquet imported from SQL Server while the external SQL server is down.
Date Thu, 01 Dec 2016 06:46:43 GMT
Thanks for the update, Rahul!

On Wed, Nov 30, 2016 at 9:45 PM Rahul Raj <rahul.raj@option3consulting.com>
wrote:

> Abhishek,
>
> Your observation is correct, we just verified that:
>
>    1. The queries run as expected(faster) with Jdbc plugin disabled.
>    2. Queries run as expected when the plugin's datasource is running.
>    3. With the datasource down, queries run very slow waiting for the
>    connection to fail
>
> Rahul
>
> On Thu, Dec 1, 2016 at 10:07 AM, Abhishek Girish <
> abhishek.girish@gmail.com>
> wrote:
>
> > @John,
> >
> > I agree that this should work. While I am not certain, I don't think the
> > issue is specific to a particular plugin, but the way in a query's
> > lifecycle, the foreman attempts to initialize every enabled storage
> plugin
> > before proceeding to execute the query. So when a particular plugin isn't
> > configured correctly or the underlying datasource is not up, this could
> > drastically slow down the query execution time.
> >
> > I'll look up to see if we have a JIRA for this already - if not will file
> > one.
> >
> > On Wed, Nov 30, 2016 at 8:12 AM, John Omernik <john@omernik.com> wrote:
> >
> > > So just my opinion in reading this thread.  (sorry for swooping in an
> > > opining)
> > >
> > > If a CTAS is done from any data source into Parquet files.... there
> > should
> > > be NO dependency on the original data source to query the resultant
> > Parquet
> > > files.   As a Drill user, as a Drill admin, this breaks the concept of
> > > least surprise.  If I take data from one source, and create Parquet
> files
> > > in a distributed file system, it should just work.  If there are
> "issues"
> > > with JDBC plugins or the HBase/Hive plugins in a similar manner, these
> > > needs to be hunted down by a large group of villages with pitchforks
> and
> > > torches.  I just can't see how this could be acceptable at any level.
> The
> > > whole idea of Parquet files is they are self describing, schema
> included
> > > files.... thus a read of a directory of Parquet files should have NO
> > > dependancies on anything but the parquet files... even the Parquet
> > > "additions" (such as the METADATA Cache) should be a fail open thing...
> > if
> > > it exists great, use it, speed things up, but if it doesn't read the
> > > parquet files as normal (Which I believe is how it operates)
> > >
> > > John
> > >
> > > On Wed, Nov 30, 2016 at 12:12 AM, Abhishek Girish <
> > > abhishek.girish@gmail.com
> > > > wrote:
> > >
> > > > Can you attempt to disable to jdbc plugin (configured with SQLServer)
> > and
> > > > try the query (on parquet) when SQL Server is offline?
> > > >
> > > > I've seen a similar issue previously when the HBase / Hive plugin was
> > > > enabled but either the plugin configuration was wrong or the
> underlying
> > > > data source was down.
> > > >
> > > > On Fri, Nov 25, 2016 at 3:21 AM, Rahul Raj
> > <rahul.raj@option3consulting.
> > > > com>
> > > > wrote:
> > > >
> > > > > I have created a parquet file using CTAS from a MS SQL Server. The
> > > query
> > > > on
> > > > > parquet is getting stuck in STARTING state for a long time before
> > > > returning
> > > > > the results.
> > > > >
> > > > > We could see that drill was trying to connect to the MS SQL server
> > from
> > > > > which the data was imported. The MSSQL server was down, drill threw
> > an
> > > > > exception "Failure while attempting to load JDBC schema", and then
> > > > returned
> > > > > the results. While SQL server is running, the query executes
> without
> > > > > issues.
> > > > >
> > > > > Why is drill querying the DB metadata externally and not the
> imported
> > > > > parquets?
> > > > >
> > > > > Rahul.
> > > > >
> > > > > --
> > > > > **** This email and any files transmitted with it are confidential
> > and
> > > > > intended solely for the use of the individual or entity to whom it
> is
> > > > > addressed. If you are not the named addressee then you should not
> > > > > disseminate, distribute or copy this e-mail. Please notify the
> sender
> > > > > immediately and delete this e-mail from your system.****
> > > > >
> > > >
> > >
> >
>
> --
> **** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.****
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message