Hi Friends,

 

I’d like to publish a document to Medium about data lakes using Spark.

Its latter parts include info that is not widely known, unless you have experience with data lakes.

 

https://github.com/borislitvak/datalake-article/blob/initial_comments/Building%20a%20Real%20Life%20Data%20Lake%20in%C2%A0AWS.md

I hope it’s OK if I ask you to review its draft.

 

You can respond here or contact me directly.

If there are some topics I should add (like, compaction effect on downstream reads using structured streaming), or there are errors, please point them out before it gets out.

Also, if some points are unclear or misleading, please state so.

 

Thanks,

Boris Litvak