drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Givre <cgi...@gmail.com>
Subject Re: describe query support? (catalog metadata, etc)
Date Thu, 19 Oct 2017 17:42:58 GMT
Hi Alfredo,
When I was trying to get Drill to work with various BI tools, all I really needed was a list
of columns.  Data types would be a big bonus, but Drill interprets CSV data as plain text
anyway.  It would be really useful for other file types where Drill does infer data types.



— C


> On Oct 19, 2017, at 6:13 AM, Alfredo Serafini <seralf@gmail.com> wrote:
> 
> Hi thanks for the replies!
> 
> @Chun yes using Views is an approach I considered, and I like it also
> methodologically, in order to have some change to "prepare" the data
> just a bit. I'm testing drill as a sort of data facade for tools which
> handles mappings to other context, so this could be helpful for me.
> 
> Anyway I have some concerns regardings metadata/catalog support for
> views too: it seems that every view is saved on disk as a JSON file,
> then experimenting the same issues. Are you suggesting saving views to
> some kind of relational database storage for staging purposes? Is that
> possible?
> 
> Sorry for all the questions :-)
> 
> 
> @Charles yes Metabase (or Tableau, Superset, and so on...) is another
> use case in which it would be great to connect them to explore data
> with the capabilities of drill, and even for an initial exploration of
> data since sometimes reducing the initial analysis phase time could
> help with development.
> 
> For CSV it would be possible IMHO to guess types in a very basic way,
> at least using basic types and map columns to a text/String when a
> type can't be inferreed. It could be a starting point, and probably
> the more confortable case where to start for the (partial) support of
> catalog informations (JSON would be more complex, just to say). If
> there are standard interfaces that can be extended/implemented for
> filling them with those informations I'd like to do some
> experimentation on that, if it's not too complex to follow, and if
> someone can point me to a good place where to start for doing some
> experiments of a possible implementation, for the CSV case.
> 
> Thanks for the comments, I appreciate them
> 
> Alfredo
> 
> 
> 
> I’d like to second Alfredo’s request.  I’ve been trying to get Drill
> to work with some
>> open source visualization tools such as SqlPad and Metabase and the issue I keep
running into
>> is that Drill doesn’t have a convenient way to describe how it interprets flat
files.  This
>> is really frustrating for me since this is my main use of Drill!
>> I wish the SELECT * FROM <data> LIMIT 0 worked in the RESTFul interface.  In
any event,
>> would be very useful to have some way to get Drill to describe how it will interpret
a flat
>> file.
>> — C
> 
> 
> 
>> On Oct 18, 2017, at 15:20, Chun Chang <cchang@mapr.com> wrote:
>> 
>> There were discussions on the need of building a catalog for drill. But I don't think
> that's the focus right now. And I am not sure the community will ever
> decide to go in that
> direction. For now, you best bet is to create views on top of your
> JSON/CSV data.
>> 
>> ________________________________
>> From: Alfredo Serafini <seralf@gmail.com>
>> Sent: Wednesday, October 18, 2017 8:31:15 AM
>> To: user@drill.apache.org
>> Subject: describe query support? (catalog metadata, etc)
>> 
>> Hi I'm experimenting using Drill as a data virtualization component via
>> JDBC and it generally works great for my needs.
>> 
>> However some of the components connected via JDBC needs basic
>> metadata/catalog informations, and they seems to be missing for JSON / CSV
>> sources.
>> 
>> For example the simple query
>> 
>> DESCRIBE cp.`employee.json`;
>> 
>> returns no results.
>> 
>> Another possible example case could be when reading from an sqlite source
>> containing the same data on an `employees` table
>> DESCRIBE `emploees`
>> 
>> and still get no information: while this command is not directly supported
>> in SQLite, an equivalent one could be for instance:
>> PRAGMA table_info(`employees`);
>> 
>> but trying to execute it in Drill is not possible, as it is beyond the
>> supported standard SQL dialect.
>> 
>> Moreover using a query like:
>> SELECT *
>> FROM INFORMATION_SCHEMA.COLUMNS
>> WHERE (TABLE_NAME='employees_view');
>> 
>> on a view from the same data, seems to return the informations, so I
>> suppose there should be a way to pass those informations to an
>> internal *DatabaseMetaData
>> <https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html>*
>> implementation.
>> I wonder if there is such a component designed to manage all the catalog
>> informations for different sources?
>> 
>> In this case it could adopt different strategies for retrieving metadata,
>> depending on the case: for sqlite a different command / dialect could be
>> used, for CSV types could be guessed using simple heuristics, and so on.
>> Probably cases like JSON would be much more complex, anyway.
>> Once the metadata have been retrieved for a source, I suppose the standard
>> SQL dialect should work as expected.
>> 
>> 
>> Are there any plans to add catalog metadata support for various sources?
>> Does anybody have some workaround? for example using views or similar
>> approaches?
>> 
>> 
>> thanks in advance, sorry if the message is too long :-)
>> Alfredo


Mime
View raw message