kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gautam Pulla <gautam.pu...@thetradedesk.com>
Subject Can a SourceTask run out of things to do?
Date Tue, 06 Jun 2017 19:16:15 GMT

I'm creating a Kafka source connector that load's some data that is in the form of individual
files that are being created continuously. I was planning initially to create one task per
file - that would allow the framework to balance the work across all workers in a straightforward
way. In the poll() method of the source task, I would read and return all records in the file,
and when poll would reach the end of the file, it would terminate and the task would be "finished".

This notion of a task being "finished" and running out of things to do is where I ran into
a problem. It doesn't seem to fit into connect's model. The worker thread calls poll() continuously
on a source task & there's no simple way in the framework to finish a task (for example:
returning null from poll will cause the worker thread to call poll again after a short pause).

>From this, I believe that source tasks are supposed to produce an *infinite* stream of
data - and I should allocate the work between tasks in some other fashion than make each individual
file a task.

Is this correct?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message