pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Szita <sz...@cloudera.com.INVALID>
Subject Re: Reading partitioned Parquet data into Pig
Date Tue, 28 Aug 2018 11:40:19 GMT
Hi Michael,

Yes you can use HCatLoader to do this.
The requirement is that you have a Hive table defined on top of your data
(probably pointing to s3://path/to/files) (and Hive MetaStore has all the
relevant meta/schema information).
If you do not have a Hive table yet, you can go ahead and define it in Hive
by manually specifying schema information, and after that partitions can be
added automatically via the 'msck repair' function of Hive.

Hope this helps,

On Mon, 27 Aug 2018 at 19:18, Michael Doo <michael.doo@verve.com> wrote:

> Hello,
> I’m trying to read in Parquet data into Pig that is partitioned (so it’s
> stored in S3 like
> s3://path/to/files/some_flag=true/part-00095-a2a6230b-9750-48e4-9cd0-b553ffc220de.c000.gz.parquet).
> I’d like to load it into Pig and add the partitions as columns. I’ve read
> some resources suggesting using the HCatLoader, but so far haven’t had
> success.
> Any advice would be welcome.
> ~ Michael

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message