aries-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Ward <timothyjw...@apache.org>
Subject Re: [DISCUSS] Goals, Requirements and API for journaled events
Date Wed, 02 Jan 2019 10:03:05 GMT
Hi all,

I think this is an interesting area to look at, but one thing I see immediately is that the
API is being designed in a way to encourage lifecycle issues. Specifically the service interface
“subscribe” method receives a consumer function from the client. It would be *much* better
if the subscribe method did not take a consumer, but instead the Subscription returned by
the subscribe method should return a PushStream.

Making this change avoids the provider implementation from having to maintain a registry of
instances from client bundles (the listener pattern is “considered harmful” in OSGi),
which can leak memory and/or class loaders as client bundles are started/stopped/updated.
Allowing the client to create PushStream instances on demand gives finer grained control for
the client over when the stream of data processing is closed (both from within and without
the data stream), and provides easier fail-safe defaults for late-registering clients.

You obviously get the further advantages of PushStreams including buffering, windowing and
transformation pipelines. Using this would allow for simpler optimisation of the fetch logic
in the Kafka/Mongo/Memory client when processing bulk messages from history.

Best Regards,

Tim

On 2 Jan 2019, at 07:30, Christian Schneider <chris@die-schneider.net<mailto:chris@die-schneider.net>>
wrote:

Am Mi., 2. Jan. 2019 um 02:05 Uhr schrieb Timothee Maret <tmaret@apache.org<mailto:tmaret@apache.org>
:

Hi,

I looked at the API considering how we could use it for our replication use
case. I identified one key concept that seemed to be missing, the indexing
of messages with monotonically increasing offsets.

For replication, we leverage those offsets extensively, for instance to
efficiently compute sub ranges of messages, to skip range of messages, to
delay processing of messages, to clean up resources, etc. If we want to
leverage the journaled event API to guarantee portability, it seems to me
that we'd need to have the offset or an equivalent construct part of the
API.

How about adding a "getOffset" signature and documenting the offset
requirement in the Position interface ?


I just started implementing the in memory impl of the API and also used an
offset.
For the cases I know an offset makes sense. Alexei and I were just unsure
if the offset
is really a general abstraction. If we all agree an offset makes sense then
I am in favour of adding it.
Actually in the case of no partitions (wich we currently assume) the
position is not more than an offset.


Another unclear intention to me in the API, is the support for partitions
(similar to Kafka). The documentation indicates it is not a goal, however
the API seems to contain some hints for multi-partition support such as the
"TopicPosition" interface. How about supporting multiple partitions in the
API by allowing to specify a key (with a semantic similar to Kafka) in the
"newMessage" signature ?


I removed the TopicPosition interface again a few days ago. It was not part
of the API Alexei and I discussed and makes no
sense when we limit ourself to no partitions (or 1 partition in case of
kafka).
So the main question is if limiting ourselves is a good idea. I think it is
but I would be very interested in other opinions.

Cheers
Christian

--
--
Christian Schneider
http://www.liquid-reality.de<http://www.liquid-reality.de/>

Computer Scientist
http://www.adobe.com<http://www.adobe.com/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message