beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [beam] lukecwik commented on issue #11037: [BEAM-9434] performance improvements reading many Avro files in S3
Date Fri, 06 Mar 2020 22:44:09 GMT
lukecwik commented on issue #11037: [BEAM-9434] performance improvements reading many Avro
files in S3
URL: https://github.com/apache/beam/pull/11037#issuecomment-595996288
 
 
   The expansion for withHintManyFiles uses a reshuffle between the match and the actual reading
of the file. The reshuffle allows for the runner to balance the amount of work across as many
nodes as it wants. The only thing being reshuffled is file metadata so after that reshuffle
the file reading should be distributed to several nodes.
   
   In your reference run, when you say that "the entire reading taking place in a single task/node",
was it that the match all happened on a single node or was it that the "read" happened all
on a single node?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message