pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eyal Allweil <eyal_allw...@yahoo.com.INVALID>
Subject Re: Reading partitioned Parquet data into Pig
Date Thu, 30 Aug 2018 14:10:13 GMT
Hi Michael,
You can also use the Parquet Pig loader (especially if you're not working with Hive). Here's
a link to the Maven repository for it.

https://mvnrepository.com/artifact/org.apache.parquet/parquet-pig/1.10.0
Regards,Eyal





   On Tuesday, August 28, 2018, 2:40:36 PM GMT+3, Adam Szita <szita@cloudera.com.INVALID>
wrote:  
 
 Hi Michael,

Yes you can use HCatLoader to do this.
The requirement is that you have a Hive table defined on top of your data
(probably pointing to s3://path/to/files) (and Hive MetaStore has all the
relevant meta/schema information).
If you do not have a Hive table yet, you can go ahead and define it in Hive
by manually specifying schema information, and after that partitions can be
added automatically via the 'msck repair' function of Hive.

Hope this helps,
Adam


On Mon, 27 Aug 2018 at 19:18, Michael Doo <michael.doo@verve.com> wrote:

> Hello,
>
> I’m trying to read in Parquet data into Pig that is partitioned (so it’s
> stored in S3 like
> s3://path/to/files/some_flag=true/part-00095-a2a6230b-9750-48e4-9cd0-b553ffc220de.c000.gz.parquet).
> I’d like to load it into Pig and add the partitions as columns. I’ve read
> some resources suggesting using the HCatLoader, but so far haven’t had
> success.
>
> Any advice would be welcome.
>
> ~ Michael
>  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message