spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szuromi Tamás <trom...@gmail.com>
Subject Re: Metadata Management
Date Fri, 20 Oct 2017 06:36:50 GMT
Hi Vasu,

https://github.com/linkedin/WhereHows might be a good fit.

Cheers
Tamas

On 2017. Oct 19., Thu at 23:22, Vasu Gourabathina <vgouraba@gmail.com>
wrote:

> All:
>
> This may be off topic for Spark, but I'm sure several of you might have
> used some form of this as part of your BigData implementations. So, wanted
> to reach out.
>
> As part of the Data Lake and Data Processing (by Spark as an example), we
> might end up different form-factors for the files (via, cleanup, enrichment
> etc).
>
> In order to make this data available for data exploration by analysts,
> data scientists - how to manage the metadata?
>   - Creating Metadata Repository
>   - Make the schemas available for users, so they may use it to create
> Hive tables, use them by Presto etc.
>
> Can you recommend some patterns (or tools) to help manage the Metadata?
> Trying to reduce the dependency on the engineers and make the
> analysts/scientists be self-sufficient as much as possible.
>
> Azure and AWS Glue Data Catalog seem to address this. Any inputs on these
> two?
>
> Appreciate in advance.
>
> Thanks,
> Vasu.
>

Mime
View raw message