pulsar-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Божидар Маринов <bojidar.marinov...@gmail.com>
Subject Does Pulsar Functions support data locality?
Date Wed, 24 Apr 2019 13:04:05 GMT

We are considering using Pulsar in a project we are currently building.
Specifically, we would like to use Pulsar Functions in order to process
lots of sequential data.

In our case, we are going to have a persistent streams of all the data so
far (so, unlimited size and time), and we want to run a function mapping
one of them to a new stream.

Due to the potentially large amounts of data, we would like to have the
function running where the data is, as opposed to streaming most of the
data between nodes.

So far, we determined that Pulsar Functions would get the stream processing
collocated with Pulsar, thus saving one of the roundtrips, and we would now
like to know if the node it runs on would be selected in a way that would
minimize the distance (as in latency) to the stored data.

Additionally, we would like to know if there is a way to configure the
function so that it will be relocated to different nodes, following the
data. For example, if the first half of stream A is stored on node 1 and
the second is stored on node 2, we would like a function with stream A as
input to run on node 1 while processing the first half of the data and then
be moved to node 2.

Thanks in advance,
Bojidar "bojidar-bg"

View raw message