drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Girish <abhishek.gir...@gmail.com>
Subject Re: Slow query on parquet imported from SQL Server while the external SQL server is down.
Date Thu, 01 Dec 2016 04:37:12 GMT
@John,

I agree that this should work. While I am not certain, I don't think the
issue is specific to a particular plugin, but the way in a query's
lifecycle, the foreman attempts to initialize every enabled storage plugin
before proceeding to execute the query. So when a particular plugin isn't
configured correctly or the underlying datasource is not up, this could
drastically slow down the query execution time.

I'll look up to see if we have a JIRA for this already - if not will file
one.

On Wed, Nov 30, 2016 at 8:12 AM, John Omernik <john@omernik.com> wrote:

> So just my opinion in reading this thread.  (sorry for swooping in an
> opining)
>
> If a CTAS is done from any data source into Parquet files.... there should
> be NO dependency on the original data source to query the resultant Parquet
> files.   As a Drill user, as a Drill admin, this breaks the concept of
> least surprise.  If I take data from one source, and create Parquet files
> in a distributed file system, it should just work.  If there are "issues"
> with JDBC plugins or the HBase/Hive plugins in a similar manner, these
> needs to be hunted down by a large group of villages with pitchforks and
> torches.  I just can't see how this could be acceptable at any level. The
> whole idea of Parquet files is they are self describing, schema included
> files.... thus a read of a directory of Parquet files should have NO
> dependancies on anything but the parquet files... even the Parquet
> "additions" (such as the METADATA Cache) should be a fail open thing... if
> it exists great, use it, speed things up, but if it doesn't read the
> parquet files as normal (Which I believe is how it operates)
>
> John
>
> On Wed, Nov 30, 2016 at 12:12 AM, Abhishek Girish <
> abhishek.girish@gmail.com
> > wrote:
>
> > Can you attempt to disable to jdbc plugin (configured with SQLServer) and
> > try the query (on parquet) when SQL Server is offline?
> >
> > I've seen a similar issue previously when the HBase / Hive plugin was
> > enabled but either the plugin configuration was wrong or the underlying
> > data source was down.
> >
> > On Fri, Nov 25, 2016 at 3:21 AM, Rahul Raj <rahul.raj@option3consulting.
> > com>
> > wrote:
> >
> > > I have created a parquet file using CTAS from a MS SQL Server. The
> query
> > on
> > > parquet is getting stuck in STARTING state for a long time before
> > returning
> > > the results.
> > >
> > > We could see that drill was trying to connect to the MS SQL server from
> > > which the data was imported. The MSSQL server was down, drill threw an
> > > exception "Failure while attempting to load JDBC schema", and then
> > returned
> > > the results. While SQL server is running, the query executes without
> > > issues.
> > >
> > > Why is drill querying the DB metadata externally and not the imported
> > > parquets?
> > >
> > > Rahul.
> > >
> > > --
> > > **** This email and any files transmitted with it are confidential and
> > > intended solely for the use of the individual or entity to whom it is
> > > addressed. If you are not the named addressee then you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately and delete this e-mail from your system.****
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message