beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Frances Perry (JIRA)" <>
Subject [jira] [Updated] (BEAM-14) Add data integration DSL
Date Tue, 16 Feb 2016 05:21:18 GMT


Frances Perry updated BEAM-14:
    Component/s:     (was: beam-model)

> Add data integration DSL
> ------------------------
>                 Key: BEAM-14
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
> Even if users would still be able to use directly the API, it would be great to provide
a DSL on top of the API covering batch and streaming data processing but also data integration.
> Instead of designing a pipeline as a chain of apply() wrapping function (DoFn), we can
provide a fluent DSL allowing users to directly leverage keyturn functions.
> For instance, an user would be able to design a pipeline like:
> {code}
> .from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”);
> {code}
> The DSL will allow to use existing pipelines, for instance:
> {code}
> .from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo&acks=all")
> {code}
> So it means that we will have to create a IO Sink that can trigger the execution of a
target pipeline: (from("trigger:other") triggering the pipeline execution when another pipeline
design starts with pipeline("other")). We can also imagine to mix the runners: the pipeline()
can be on one runner, the from("trigger:other") can be on another runner). It's not trivial,
but it will give strong flexibility and key value for Beam.
> In a second step, we can provide DSLs in different languages (the first one would be
Java, but why not providing XML, akka, scala DSLs).
> We can note in previous examples that the DSL would also provide data integration support
to bean in addition of data processing. Data Integration is an extension of Beam API to support
some Enterprise Integration Patterns (EIPs). As we would need metadata for data integration
(even if metadata can also be interesting in stream/batch data processing pipeline), we can
provide a DataxMessage built on top of PCollection. A DataxMessage would contain:
> structured headers
> binary payload
> For instance, the headers can contains an Avro schema to describe the payload.
> The headers can also contains useful information coming from the IO Source (for instance
the partition/path where the data comes from, …).

This message was sent by Atlassian JIRA

View raw message