Hi everyone,
I have a PCollection of avro based objects and I want to categorize these
avro objects by a certain property by writing each category into a
different avro file. The number of distinct categories should be small
(hundreds) and the property I am categorizing on is a String. I was hoping
there was some way to end up with a Map<String, PCollection> but there
didn't seem to be any obvious choice. For now I have gone with a simple
approach of
- Find all categories (DoFn that returns PCollection<String>)
- Materialize and iterate over this collection
- For each category use a FilterFn to create desired categorized
PCollection
- Write this to avro file
This works but it seems like there should be a better way to do it. Any
thoughts?
-Bryan
|