drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Pfarr <z0lt...@pm.me.INVALID>
Subject Re: Iceberg or deltalake table as input for drill queries
Date Sun, 04 Jul 2021 16:12:59 GMT
Hi luoc,







of course. I would be happy to support you with this.




How to start?







Regards,




z0ltrix

















\-------- Original-Nachricht --------
Am 4. Juli 2021, 17:37, luoc schrieb:

>
>
>
>
> Hi,
> Makes perfect sense so far. Obviously, you understand the difference between batch computation
and Ad-Hoc. At the same time, Drill is a high-performance MPP query layer for self describing
data, schema-free and ANSI SQL.
> Would you mind helping me open an issue on the Github? Is a good way to initiate the
technical discussion.
>
> > 在 2021年7月4日,02:54,Christian Pfarr <z0ltrix@pm.me.invalid> 写道:
> > Hi luoc,
> >
> >
> > thanks for the information.
> >
> >
> > I think this kind of storage format is used more and more in cloud architectures
because it departments wants to use as less tools as possible to provide a big data product.
With iceberg they can build consistant and scalable big data structures for stream and batch
processing at the same storage layer with a single tool, Spark.
> >
> >
> > The problem is how to provide the data to customers. In my opinion Spark itself
is too slow for interactive querying by a lot of people or BI Tools. Thats the point where
Tools like Presto, Drill or Dremio enters the stage.
> >
> >
> > I would like to see Drill as competitor in this area, especially because of the
brilliant flexible and schemaless design.
> >
> >
> > If the Iceberg implementation is already done for metastore and you are already
experienced with its internals, it sounds worth to invest the time and energy for a new format
plugin.
> >
> >
> > Just the opinion of an consultant who wants to recommend drill for this usecases
;)
> >
> >
> > Regards
> >
> > z0ltrix
> >
> >
> >
> >
> >
> >
> >
> > -------- Original-Nachricht --------
> > Am 3. Juli 2021, 16:55, luoc schrieb:
> >
> > Hello,
> > Thanks for the interest. Drill’s Metastore allows to use a storage engine based
on Iceberg tables. But now, It seems that Drill does not support the data of Iceberg for query.
I will tell you that Drill can definitely support Iceberg, including readable and writeable.
The condition is that we need to develop the format plugin using the "Easy framework based
on EVF". Please let me know if you are interested in the that.
> >
> > > 2021年7月3日 上午2:41,Christian Pfarr <z0ltrix@pm.me.INVALID>
写道:
> > >
> > > Hello everyone,
> > >
> > >
> > > it looks like more and more people are using deltalake or iceberg in spark
for transactional working with big tables.
> > >
> > >
> > > Additionally i saw that drill is using iceberg as storage engine for metadata.
> > >
> > >
> > > So, i wonder if its possible to query iceberg tables stored in hdfs or s3 directly
via drill so that i can process my data with spark iceberg tables and present them with drill
to my data scientists.
> > >
> > >
> > > Regards,
> > >
> > > z0ltrix
> > >
> > >
> > >
> > >
> > >
> > >
> > > <publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc>
> >
> > <publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc>
>
Mime
View raw message