crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hrishikesh P <>
Subject ParallelDo - DoFn in-order processing
Date Fri, 15 Nov 2013 16:38:30 GMT
Hello -

In the parallelDo-DoFn processing, is it possible to ensure that the
records in the PTable will be processed in the given order? I have a PTable
of long and bytes (PTable<Long, ByteBuffer>) which is sorted by the long
value and I want to make sure that when the DoFn#process is called, the
records will be processed in the sorted order, as there may be a dependency
between the records.

I thought of a few options, like storing the sorted results to a text file
and using the file to process the records in the DoFn or using a table to
track the records being processed but wasn't sure if they would give
correct results and was wondering if there is a better approach.


View raw message