pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eyal Allweil <e...@apache.org>
Subject Blog post on recent Pig content contributed to Apache DataFu
Date Sun, 20 Jan 2019 16:28:19 GMT
I wrote a blog post for the PayPal engineering blog detailing some of the
(Pig) content I've contributed to DataFu on behalf of PayPal. The post
contains documentation and code samples of three macros and a UDF:

*dedup* - for deduplicating rows based on a key and date updated fields

*sample_by_keys* - a macro for generating a sample of a table based on a
list of unique ids

*diff_macro* - for generating a human readable diff between two tables

*CountDistinctUpTo* - a UDF which performs much better than pure Pig for
cases in which you don't need the actual records, but just to verify that a
certain amount exists


The blog post will be cross-posted to the Apache DataFu blog soon.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message