drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chun Chang <cch...@mapr.com>
Subject Re: describe query support? (catalog metadata, etc)
Date Thu, 19 Oct 2017 17:52:41 GMT
Hi Alfredo,


I do not fully understand what you are asking but you are right that drill store views in
JSON format. After you create a view on top of your CSV file, you can DESCRIBE the view to
get schema info if that's what you want.


Sorry if I misunderstood your question.

________________________________
From: Alfredo Serafini <seralf@gmail.com>
Sent: Thursday, October 19, 2017 3:13:10 AM
To: user@drill.apache.org
Subject: Re: describe query support? (catalog metadata, etc)

Hi thanks for the replies!

@Chun yes using Views is an approach I considered, and I like it also
methodologically, in order to have some change to "prepare" the data
just a bit. I'm testing drill as a sort of data facade for tools which
handles mappings to other context, so this could be helpful for me.

Anyway I have some concerns regardings metadata/catalog support for
views too: it seems that every view is saved on disk as a JSON file,
then experimenting the same issues. Are you suggesting saving views to
some kind of relational database storage for staging purposes? Is that
possible?

Sorry for all the questions :-)


@Charles yes Metabase (or Tableau, Superset, and so on...) is another
use case in which it would be great to connect them to explore data
with the capabilities of drill, and even for an initial exploration of
data since sometimes reducing the initial analysis phase time could
help with development.

For CSV it would be possible IMHO to guess types in a very basic way,
at least using basic types and map columns to a text/String when a
type can't be inferreed. It could be a starting point, and probably
the more confortable case where to start for the (partial) support of
catalog informations (JSON would be more complex, just to say). If
there are standard interfaces that can be extended/implemented for
filling them with those informations I'd like to do some
experimentation on that, if it's not too complex to follow, and if
someone can point me to a good place where to start for doing some
experiments of a possible implementation, for the CSV case.

Thanks for the comments, I appreciate them

Alfredo



I’d like to second Alfredo’s request.  I’ve been trying to get Drill
to work with some
> open source visualization tools such as SqlPad and Metabase and the issue I keep running
into
> is that Drill doesn’t have a convenient way to describe how it interprets flat files.
 This
> is really frustrating for me since this is my main use of Drill!
> I wish the SELECT * FROM <data> LIMIT 0 worked in the RESTFul interface.  In any
event,
> would be very useful to have some way to get Drill to describe how it will interpret
a flat
> file.
> — C



> On Oct 18, 2017, at 15:20, Chun Chang <cchang@mapr.com> wrote:
>
> There were discussions on the need of building a catalog for drill. But I don't think
that's the focus right now. And I am not sure the community will ever
decide to go in that
direction. For now, you best bet is to create views on top of your
JSON/CSV data.
>
> ________________________________
> From: Alfredo Serafini <seralf@gmail.com>
> Sent: Wednesday, October 18, 2017 8:31:15 AM
> To: user@drill.apache.org
> Subject: describe query support? (catalog metadata, etc)
>
> Hi I'm experimenting using Drill as a data virtualization component via
> JDBC and it generally works great for my needs.
>
> However some of the components connected via JDBC needs basic
> metadata/catalog informations, and they seems to be missing for JSON / CSV
> sources.
>
> For example the simple query
>
> DESCRIBE cp.`employee.json`;
>
> returns no results.
>
> Another possible example case could be when reading from an sqlite source
> containing the same data on an `employees` table
> DESCRIBE `emploees`
>
> and still get no information: while this command is not directly supported
> in SQLite, an equivalent one could be for instance:
> PRAGMA table_info(`employees`);
>
> but trying to execute it in Drill is not possible, as it is beyond the
> supported standard SQL dialect.
>
> Moreover using a query like:
> SELECT *
> FROM INFORMATION_SCHEMA.COLUMNS
> WHERE (TABLE_NAME='employees_view');
>
> on a view from the same data, seems to return the informations, so I
> suppose there should be a way to pass those informations to an
> internal *DatabaseMetaData
> <https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html>*
> implementation.
> I wonder if there is such a component designed to manage all the catalog
> informations for different sources?
>
> In this case it could adopt different strategies for retrieving metadata,
> depending on the case: for sqlite a different command / dialect could be
> used, for CSV types could be guessed using simple heuristics, and so on.
> Probably cases like JSON would be much more complex, anyway.
> Once the metadata have been retrieved for a source, I suppose the standard
> SQL dialect should work as expected.
>
>
> Are there any plans to add catalog metadata support for various sources?
> Does anybody have some workaround? for example using views or similar
> approaches?
>
>
> thanks in advance, sorry if the message is too long :-)
> Alfredo

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message