drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Givre <cgi...@gmail.com>
Subject Re: describe query support? (catalog metadata, etc)
Date Tue, 24 Oct 2017 13:21:28 GMT
I’m aware of that, but there is no way as it stands to discover the schema of a file, CSV
or otherwise.  I’m not talking about attempting to infer data types, just the simple number
of columns, name of columns and what Drill thinks they are.  I know that you can do this via
a series of queries, but for files, there is no equivalent of DESCRIBE.
That’s what I was getting at and I think that would be really useful to have.  Also it wouldn’t
really alter the character of Drill.
— C

> On Oct 24, 2017, at 02:02, Divya Gehlot <divya.htconex@gmail.com> wrote:
> @Charles : when you read csv files can casting and get the data type as per
> your requirements
> Hope I answer what you asked for :)
> Thanks,
> Divya
> On 20 October 2017 at 01:42, Charles Givre <cgivre@gmail.com> wrote:
>> Hi Alfredo,
>> When I was trying to get Drill to work with various BI tools, all I really
>> needed was a list of columns.  Data types would be a big bonus, but Drill
>> interprets CSV data as plain text anyway.  It would be really useful for
>> other file types where Drill does infer data types.
>> — C
>>> On Oct 19, 2017, at 6:13 AM, Alfredo Serafini <seralf@gmail.com> wrote:
>>> Hi thanks for the replies!
>>> @Chun yes using Views is an approach I considered, and I like it also
>>> methodologically, in order to have some change to "prepare" the data
>>> just a bit. I'm testing drill as a sort of data facade for tools which
>>> handles mappings to other context, so this could be helpful for me.
>>> Anyway I have some concerns regardings metadata/catalog support for
>>> views too: it seems that every view is saved on disk as a JSON file,
>>> then experimenting the same issues. Are you suggesting saving views to
>>> some kind of relational database storage for staging purposes? Is that
>>> possible?
>>> Sorry for all the questions :-)
>>> @Charles yes Metabase (or Tableau, Superset, and so on...) is another
>>> use case in which it would be great to connect them to explore data
>>> with the capabilities of drill, and even for an initial exploration of
>>> data since sometimes reducing the initial analysis phase time could
>>> help with development.
>>> For CSV it would be possible IMHO to guess types in a very basic way,
>>> at least using basic types and map columns to a text/String when a
>>> type can't be inferreed. It could be a starting point, and probably
>>> the more confortable case where to start for the (partial) support of
>>> catalog informations (JSON would be more complex, just to say). If
>>> there are standard interfaces that can be extended/implemented for
>>> filling them with those informations I'd like to do some
>>> experimentation on that, if it's not too complex to follow, and if
>>> someone can point me to a good place where to start for doing some
>>> experiments of a possible implementation, for the CSV case.
>>> Thanks for the comments, I appreciate them
>>> Alfredo
>>> I’d like to second Alfredo’s request.  I’ve been trying to get Drill
>>> to work with some
>>>> open source visualization tools such as SqlPad and Metabase and the
>> issue I keep running into
>>>> is that Drill doesn’t have a convenient way to describe how it
>> interprets flat files.  This
>>>> is really frustrating for me since this is my main use of Drill!
>>>> I wish the SELECT * FROM <data> LIMIT 0 worked in the RESTFul
>> interface.  In any event,
>>>> would be very useful to have some way to get Drill to describe how it
>> will interpret a flat
>>>> file.
>>>> — C
>>>> On Oct 18, 2017, at 15:20, Chun Chang <cchang@mapr.com> wrote:
>>>> There were discussions on the need of building a catalog for drill. But
>> I don't think
>>> that's the focus right now. And I am not sure the community will ever
>>> decide to go in that
>>> direction. For now, you best bet is to create views on top of your
>>> JSON/CSV data.
>>>> ________________________________
>>>> From: Alfredo Serafini <seralf@gmail.com>
>>>> Sent: Wednesday, October 18, 2017 8:31:15 AM
>>>> To: user@drill.apache.org
>>>> Subject: describe query support? (catalog metadata, etc)
>>>> Hi I'm experimenting using Drill as a data virtualization component via
>>>> JDBC and it generally works great for my needs.
>>>> However some of the components connected via JDBC needs basic
>>>> metadata/catalog informations, and they seems to be missing for JSON /
>> CSV
>>>> sources.
>>>> For example the simple query
>>>> DESCRIBE cp.`employee.json`;
>>>> returns no results.
>>>> Another possible example case could be when reading from an sqlite
>> source
>>>> containing the same data on an `employees` table
>>>> DESCRIBE `emploees`
>>>> and still get no information: while this command is not directly
>> supported
>>>> in SQLite, an equivalent one could be for instance:
>>>> PRAGMA table_info(`employees`);
>>>> but trying to execute it in Drill is not possible, as it is beyond the
>>>> supported standard SQL dialect.
>>>> Moreover using a query like:
>>>> SELECT *
>>>> WHERE (TABLE_NAME='employees_view');
>>>> on a view from the same data, seems to return the informations, so I
>>>> suppose there should be a way to pass those informations to an
>>>> internal *DatabaseMetaData
>>>> <https://docs.oracle.com/javase/8/docs/api/java/sql/
>> DatabaseMetaData.html>*
>>>> implementation.
>>>> I wonder if there is such a component designed to manage all the catalog
>>>> informations for different sources?
>>>> In this case it could adopt different strategies for retrieving
>> metadata,
>>>> depending on the case: for sqlite a different command / dialect could be
>>>> used, for CSV types could be guessed using simple heuristics, and so on.
>>>> Probably cases like JSON would be much more complex, anyway.
>>>> Once the metadata have been retrieved for a source, I suppose the
>> standard
>>>> SQL dialect should work as expected.
>>>> Are there any plans to add catalog metadata support for various sources?
>>>> Does anybody have some workaround? for example using views or similar
>>>> approaches?
>>>> thanks in advance, sorry if the message is too long :-)
>>>> Alfredo

View raw message