pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Doo <michael....@verve.com>
Subject Avro vs Parquet performance on Pig
Date Thu, 07 Feb 2019 22:04:33 GMT
Hey all,
I’ve been migrating some processes over from ingesting Avro to ingesting Parquet. In Spark,
we’re seeing 2x-8x performance gains when using Parquet over Avro. In Pig, similar processes
are about the same runtime between the two formats (and sometimes even higher using Parquet).
We’ve enabled dictionary filtering as well as predicate filter/pushdown. Wondering if there
are other settings / strategies we might be missing to take advantage of Parquet.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message