pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject Re: Blog post on recent Pig content contributed to Apache DataFu
Date Sun, 20 Jan 2019 17:22:05 GMT
Nice!

On Sun, Jan 20, 2019 at 8:28 AM Eyal Allweil <eyal@apache.org> wrote:

> I wrote a blog post for the PayPal engineering blog detailing some of the
> (Pig) content I've contributed to DataFu on behalf of PayPal. The post
> contains documentation and code samples of three macros and a UDF:
>
> *dedup* - for deduplicating rows based on a key and date updated fields
>
> *sample_by_keys* - a macro for generating a sample of a table based on a
> list of unique ids
>
> *diff_macro* - for generating a human readable diff between two tables
>
> *CountDistinctUpTo* - a UDF which performs much better than pure Pig for
> cases in which you don't need the actual records, but just to verify that a
> certain amount exists
>
>
> https://medium.com/paypal-engineering/a-guide-to-paypals-contributions-to-apache-datafu-b30cc25e0312
>
> The blog post will be cross-posted to the Apache DataFu blog soon.
>
> Cheers,
> Eyal
>
-- 
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jurney@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message