crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Baugher <>
Subject Splitting a PCollection
Date Tue, 26 Nov 2013 19:25:55 GMT
Hi everyone,

I have a PCollection of avro based objects and I want to categorize these
avro objects by a certain property by writing each category into a
different avro file. The number of distinct categories should be small
(hundreds) and the property I am categorizing on is a String. I was hoping
there was some way to end up with a Map<String, PCollection> but there
didn't seem to be any obvious choice. For now I have gone with a simple
approach of

   - Find all categories (DoFn that returns PCollection<String>)
   - Materialize and iterate over this collection
      - For each category use a FilterFn to create desired categorized
      - Write this to avro file

This works but it seems like there should be a better way to do it. Any


View raw message