spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zoltán Zvara <zoltan.zv...@gmail.com>
Subject Spark Streaming - received block allocation to batch
Date Wed, 11 Mar 2015 13:58:27 GMT
I'm trying to understand the block allocation mechanism Spark uses to
generate batch jobs and a JobSet.

The JobGenerator.generateJobs tries to allocate received blocks to batch,
effectively in ReceivedBlockTracker.allocateBlocksToBatch creates
a streamIdToBlocks, where steam ID's (Int) mapped to Seq[ReceivedBlockInfo]
using getReceivedBlockQueue. This is where it gets tricky for me.

getReceivedBlockQueue of class ReceivedBlockTracker reads
streamIdToUnallocatedBlockQueues
that should be populated with ReceivedBlockQueues? Who inserts these
ReceivedBlockQueues into streamIdToUnallocatedBlockQueues and where does it
get written? I've found only usages of 'effectively' value read.

At a point streamIdToBlocks get packed into a case class
of AllocatedBlocks. Why is it necessary?

Also, at JobGenerator.generateJobs the line where receivedBlockInfos created,
shouldn't it be empty, because streamIdToUnallocatedBlockQueues never got
written to? Where do I miss the point? How does the JobGenerator.generateJobs
able to retrieve the received block infos?

Thanks,

ZZ

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message