drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Pfarr <z0lt...@pm.me.INVALID>
Subject Re: Iceberg or deltalake table as input for drill queries
Date Sat, 03 Jul 2021 18:53:34 GMT
Hi luoc,







thanks for the information.







I think this kind of storage format is used more and more in cloud architectures because it
departments wants to use as less tools as possible to provide a big data product. With iceberg
they can build consistant and scalable big data structures for stream and batch processing
at the same storage layer with a single tool, Spark.







The problem is how to provide the data to customers. In my opinion Spark itself is too slow
for interactive querying by a lot of people or BI Tools. Thats the point where Tools like
Presto, Drill or Dremio enters the stage.







I would like to see Drill as competitor in this area, especially because of the brilliant
flexible and schemaless design.







If the Iceberg implementation is already done for metastore and you are already experienced
with its internals, it sounds worth to invest the time and energy for a new format plugin.







Just the opinion of an consultant who wants to recommend drill for this usecases ;)







Regards




z0ltrix




















\-------- Original-Nachricht --------
Am 3. Juli 2021, 16:55, luoc schrieb:

>
>
>
> Hello,
> Thanks for the interest. Drill’s Metastore allows to use a storage engine based on
Iceberg tables. But now, It seems that Drill does not support the data of Iceberg for query.
I will tell you that Drill can definitely support Iceberg, including readable and writeable.
The condition is that we need to develop the format plugin using the "Easy framework based
on EVF". Please let me know if you are interested in the that.
>
> > 2021年7月3日 上午2:41,Christian Pfarr <z0ltrix@pm.me.INVALID> 写道:
> >
> > Hello everyone,
> >
> >
> > it looks like more and more people are using deltalake or iceberg in spark for transactional
working with big tables.
> >
> >
> > Additionally i saw that drill is using iceberg as storage engine for metadata.
> >
> >
> > So, i wonder if its possible to query iceberg tables stored in hdfs or s3 directly
via drill so that i can process my data with spark iceberg tables and present them with drill
to my data scientists.
> >
> >
> > Regards,
> >
> > z0ltrix
> >
> >
> >
> >
> >
> >
> > <publickey - EmailAddress(s=z0ltrix@pm.me) - 0xF0E154C5.asc>
Mime
View raw message