drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Re: Slow query on parquet imported from SQL Server while the external SQL server is down.
Date Wed, 30 Nov 2016 16:12:48 GMT
So just my opinion in reading this thread.  (sorry for swooping in an
opining)

If a CTAS is done from any data source into Parquet files.... there should
be NO dependency on the original data source to query the resultant Parquet
files.   As a Drill user, as a Drill admin, this breaks the concept of
least surprise.  If I take data from one source, and create Parquet files
in a distributed file system, it should just work.  If there are "issues"
with JDBC plugins or the HBase/Hive plugins in a similar manner, these
needs to be hunted down by a large group of villages with pitchforks and
torches.  I just can't see how this could be acceptable at any level. The
whole idea of Parquet files is they are self describing, schema included
files.... thus a read of a directory of Parquet files should have NO
dependancies on anything but the parquet files... even the Parquet
"additions" (such as the METADATA Cache) should be a fail open thing... if
it exists great, use it, speed things up, but if it doesn't read the
parquet files as normal (Which I believe is how it operates)

John

On Wed, Nov 30, 2016 at 12:12 AM, Abhishek Girish <abhishek.girish@gmail.com
> wrote:

> Can you attempt to disable to jdbc plugin (configured with SQLServer) and
> try the query (on parquet) when SQL Server is offline?
>
> I've seen a similar issue previously when the HBase / Hive plugin was
> enabled but either the plugin configuration was wrong or the underlying
> data source was down.
>
> On Fri, Nov 25, 2016 at 3:21 AM, Rahul Raj <rahul.raj@option3consulting.
> com>
> wrote:
>
> > I have created a parquet file using CTAS from a MS SQL Server. The query
> on
> > parquet is getting stuck in STARTING state for a long time before
> returning
> > the results.
> >
> > We could see that drill was trying to connect to the MS SQL server from
> > which the data was imported. The MSSQL server was down, drill threw an
> > exception "Failure while attempting to load JDBC schema", and then
> returned
> > the results. While SQL server is running, the query executes without
> > issues.
> >
> > Why is drill querying the DB metadata externally and not the imported
> > parquets?
> >
> > Rahul.
> >
> > --
> > **** This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom it is
> > addressed. If you are not the named addressee then you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately and delete this e-mail from your system.****
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message