samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Re: Interaction with Flink DataStreams
Date Wed, 20 Apr 2016 07:31:43 GMT
Hi Simone,

Indeed, right now the connector part is not very well developed, as we
don't have a simple way to reuse connector code across platforms.
The main issue is that each platform has its own connectors, but SAMOA
should not rely directly on them, rather it should encapsulate them in a
transparent way.
Otherwise, SAMOA can use its own connectors (as it is now for local
filesystem and HDFS), although this requires duplication of work.

It would be interesting to see if the connectors can be generalized as we
did with the platforms.
Something like a generic Kafka connector, then each platform has its own
instance.
What would their API look like?

You are right, we don't have good practices for deploying SAMOA in the real
world.
We had started working on pipelines a while ago, but then the people taking
care of it started a new job and had no more time to contribute.
Ideally, if we have a working Kafka connector, it should be easy to
read/write from/to a topic, adding some custom preprocessing if needed.

There is some initial work on that in SAMOA-40 (
https://github.com/apache/incubator-samoa/pull/32) however it is still not
there yet.
If you would like to tackle it we would be glad to have your contribution!

Cheers,

-- Gianmarco

On Tue, Apr 19, 2016 at 4:36 PM, Simone Robutti <
simone.robutti@radicalbit.io> wrote:

> Hello,
>
> I just began working on Flink-SAMOA to evaluate its development status. I'm
> trying to understand the way it is fed data and how it outputs them. From
> what I've understood I can see that there's no interaction between SAMOA
> and Flink data structures because being agnostic, SAMOA just creates
> instances of its own sources and sinks inside Flink.
>
> Do you think there could be a way to exploit the other Flink connectors?
>
> Also I couldn't find any documentation on patterns and good practices to
> create a pipeline that includes SAMOA algorithms. For what I can see the
> only way to work with it is to pre-process data, write them into a file
> that will be read by SAMOA and write them again into a file. This clearly
> is not a good way to work on a high-performance enviroment so I would like
> to know if there's a better way to do it and if there's documentation about
> it.
>
> Thanks,
>
> Simone
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message