Skip to main contentIBM Garage Event-Driven Reference Architecture

Reference Architecture

We defined the starting point for a cloud-native event-driven architecture to be that it supports at least the following important capabilities:

  • Being able to communicate and persist events.
  • Being able to take direct action on events.
  • Processing streams of events to derive real time insight/intelligence.
  • Providing communication mechanism between event driven microservices and functions.

Event Driven Architecture

With an event backbone providing the connectivity between the capabilities, we can visualize a reference event-driven architecture as follows:



  1. Event sources: generates events and event streams from sources such as IoT devices, web app, mobile app, microservices…
  2. IBM Event Streams: Provides an Event Backbone supporting Pub/Sub communication, an event log, and simple event stream processing based on Apache Kafka.
  3. IBM Cloud Functions: Provides a simplified programming model to take action on an event through a “serverless” function-based compute model.
  4. Streaming Analytics: Provides continuous ingest and analytical processing across multiple event streams. Decision Server Insights: Provides the means to take action on events and event streams through business rules.
  5. Event Stores: Provide optimized persistence (data stores), for event sourcing, Command Query Response Separation (CQRS) and analytical use cases.
  6. Event-Driven Microservices: Applications that run as serverless functions or containerized workloads which are connected via pub/sub event communication through the event backbone.

The reference architecture is illustrated in the EDA reference implementation solution.

Extended Architecture

The event-driven reference architecture provides the framework to support event-driven applications and solutions. The extended architecture provides the connections for:

  • Integration with legacy apps and data resources
  • Integration with analytics or machine learning to derive real-time insights

The diagram below shows how these capabilities fit together to form an extended event-driven architecture.


In 7. the AI workbench includes tools to do data analysis and visualization, build training and test sets from any datasource and in particular Event Store, and develop models. Models are pushed to streaming analytics component.

Integration with analytics and machine learning

The extended architecture extends the basic EDA reference architecture with concepts showing how data science, artificial intelligence and machine learning can be incorporated into an event-driven solution. The following diagram illustrats the event sources on the left injecting events to topics where green components are consuming from. Those components apply filtering, compute aggregates and stateful operation with time window based rules. Some of those components can include training scoring model, to do for example anomaly detection. The model is built with data scientist workbench tool, like Watson Studio.


The starting point for data scientists to be able to derive machine learning models or analyze data for trends and behaviors is the existence of the data in a form that they can be consumed. For real-time intelligent solutions, data scientists typically inspect event histories and decision or action records from a system. Then, they reduce this data to some simplified model that scores new event data as it arrives.

Getting the data for the data scientist:

With real-time event streams, the challenge is in handling unbounded data or a continuous flow of events. To make this consumable for the data scientist you need to capture the relevant data and store it so that it can be pulled into the analysis and model-building process as required.

Following our event-driven reference architecture the event stream would be a Kafka topic on the event backbone. From here there are two possibilities for making that event data available and consumable to the data scientist:

  • The event stream or event log can be accessed directly through Kafka and pulled into the analysis process
  • The event stream can be pre-processed by the streaming analytics system and stored for future use in the analysis process. You have a choice of store type to use. Within public IBM cloud object storage Cloud Object Store can be used as a cost-effective historical store.

Both approaches are valid, pre-processing through streaming analytics provides opportunity for greater manipulation of the data, or storing data over time windows for complex event processing. However, the more interesting distinction is where you use a predictive (ML model) to score arriving events or stream data in real time. In this case you may use streaming analytics to extract and save the event data for analysis, model building, and model training and also for scoring (executing) a derived model in line in the real time against arriving event data.

  • The event and decision or action data is made available in cloud object storage for model building through streaming analytics.
  • Models may be developed by tuning and parameter fitting, standard form fitting, classification techniques, and text analytics methods.
  • Increasingly artificial intelligence (AI) and machine learning (ML) frameworks are used to discover and train useful predictive models as an alternative to parameterizing existing model types manually.
  • These techniques lead to process and data flows where the predictive model is trained offline using event histories from the event and the decision or action store possibly augmented with some supervisory outcome labelling, as illustrated by the paths from the Event Backbone and Stream Processing store into Learn/Analyze.
  • A model trained in this way includes some “scoring” API that can be invoked with fresh event data to generate a model-based prediction for future behavior and event properties of that specific context.
  • The scoring function is then easily reincorporated into the streaming analytics processing to generate predictions and insights.

These combined techniques can lead to the creation of real-time intelligent applications:

  1. Event-driven architecture
  2. Identification of predictive insights using event storming methodology
  3. Developing models for these insights using machine learning
  4. Real-time scoring of the insight models using a streaming analytics processing framework

These are scalable easily extensible, and adaptable applications responding in near real time to new situations. There are easily extended to build out and evolve from an initial minimal viable product (MVP) because of the loose coupling in the event-driven architecture, , and streams process domains.

Data scientist workbench

To complete the extended architecture for integration with analytics and machine learning, consider the toolset and frameworks that the data scientist can use to derive the models. Watson Studio provides tools for data scientists, application developers, and subject matter experts to collaboratively and easily work with data to build and train models at scale.

For more information see Getting started with Watson Studio.

Modern Data Lake

One of the standard architecture to build data lake is the lambda architecture with data injection, stream processing, batch processing to data store and then queries as part of the service layer. It is designed to handle massive quantities of data by taking advantage of both batch and stream processing methods. Lambda architecture depends on a data model with an append-only, immutable data source that serves as a system of record. The batch layer precomputes results using a distributed processing system that can handle very large quantities of data. Output from the batch and speed layers are stored in the serving layer, which responds to ad-hoc queries by returning precomputed views or building views from the processed data.

The following figure is an enhancement of the lambda architecture with the adoption of Kafka as event backbone for data pipeline and source of truth and streaming processing to support real time analytics and streaming queries.


On the left the different data sources, injected using different protocols like MQTT, HTTP, or Kafka Connect… The business applications are supported by different microservices that are exposed by APIs and event-driven. The APIs is managed by API management product. Business events are produced as facts about the business entities, and persisted in the append log of kafka topic. Transactional data can be injected from MQ queues to Kafka topic, via MQ connectors.

The data platform offers a set of capabilities to expose data for consumers like Data Science workbench (Watson Studio) via virtualization and data connections. The data are cataloged and governed to ensure integrity and visibility. The storage can be block based, document oriented or table oriented.

Batch queries and map reduce can address huge data raw, while streaming queries can support real time aggregates and analytics.

Legacy Integration

While you create new digital business applications as self-contained systems, you likely need to integrate legacy apps and databases into the event-driven system. Two ways of coming directly into the event-driven architecture are as follows:

  1. Where legacy applications are connected with MQ. You can connect directly from MQ to the Kafka in the event backbone. See IBM Event Streams getting started with MQ article. The major idea here is to leverage the transactionality support of MQ, so writing to the databased and to the queue happen in the same transaction:

  2. Where databases support the capture of changes to data, you can publish changes as events to Kafka and hence into the event infrastructure. This could leverage the outbox pattern where events are prepared by the application and written, in the same transaction as the other tables, and read by the CDC capture agent.


Or use an efficient CDC product to get the change data capture at the transaction level. IBM offers the best CDC product on the market, (InfoSphere Data Replication 11.4.0), with subsecond average latency and support full transactional semantics with exactly once consumption. It includes an efficient Kafka integration.

One of the challenges of basic CDC products, is the replication per table pattern, leads to retry to rebuild the transaction integrity using kafka stream to join data from multiple topics. The TCC (Transactionally consistent consumer) technology allows Kafka replication to have semantics similar to a relational database. This dramatically increases the types of business logic that can be implemented. Developer can recreate the order of operations in source transactions across multiple Kafka topics and partitions and consume Kafka records that are free of duplicates by including the Kafka transactionally consistent consumer library in your Java applications. TCC allows:

  • Elimination of any duplicates, even in abnormal termination scenarios
  • Reconstruction of exact transaction order, despite those transactions having been performance optimized and applied out of order to Kafka
  • Reconstruction of exact operation order within a transaction, despite said operations having been applied to different topics and/or partitions. This is not offered by default Kafka’s “exactly once” functionality
  • Ability for hundreds+ producers to participate in a single transaction. Kafka’s implementation has one producer create all messages for a transaction, despite those messages going to different topics.
  • Provides a unique bookmark, so that downstream applications can check-point and resume exactly where they last left off if they fail.

We recommend listeing to this presentation from Shawn Roberston - IBM, on A Solution for Leveraging Kafka to Provide End-to-End ACID Transactions

The second, very important, feature is on the producer side, with the Kafka custom operation processor (or KCOP) infrastructure. KCOP helps you to control over the Kafka producer records that are written to Kafka topics in response to insert, update, and delete operations that occur on source database tables. It allows a user to programmatically dictate the exact key an byte values of the message written to Kafka. Therefore any individual row transformation message encoding format is achievable. Out of the box it includes Avro, CSV, JSON message encoding formats. It is possible to perform column level RSA encryption on certain values before writing. It also permits enriching of the message with additional annotation if needed. Developers have the complete choice over how data is represented. Eg. Can write data in Kafka Compaction compliant format with deletes being represented as Kafka tombstones or can write the content of the message being deleted.

It also supports Kafka Header fields for efficient downstream routing without the need for immediate de-serialization. The KCOP allows a user to determine how many messages are written to Kafka in response to a source operation, the content of the messages, and their destination.

  • Allows for filtering based on column values.
  • Allows for writing the entire row with sensitive data to highly restricted topics and a subset of the columns to wider shared topics.
  • Allows for writing the same message in two different formats to two different topics. Useful in environments where some consuming applications want JSON, others prefer Avro, both can be produced in parallel if desired.
  • Allows for sending special flags to a monitoring topic. Eg. when a transaction exceeds $500, in addition to the regular message, a special topic is written to notifying of the large value transaction

The two diagrams above, illustrate a common architecture for data pipeline, using event backbone, where the data is transformed into different data model, that can be consumed by components that act on those data, and move the data document into data lake for big data processing.

Finally it is important to note that the deployment of the event streams, CDC can be colocated in the mainframe to reduce operation and runtime cost. It also reduces complexity. In the following diagram, event stream brokers are deployed on OpenShift on Linux on Z and the CDC servers on Linux too.


This architecture pattern try to reduce the MIPs utilization on the mainframe to the minimum by still ensuring data pipeline, with transactional integrity.

  • Quality of Service – autoscaling / balancing between Linux nodes, Resilience.
  • Latency - memory speed (Network -> HiperSocket, with memory speed and bandwidth)
  • Reduce MIPS (avoid Authentication-TLS overhead on z/OS as no network traffic is encrypted)
  • Avoid network spend / management / maintenance between servers
  • Improved QoS for the Kafka service – inherits Z platform (Event Streams is the only Kafka variant currently supported on Linix on Z)
  • Reduced complexity / management cost
  • Reduced latency / network infrastructure (apply – Kafka hop is now in memory) – avoids need for encryption

The CDC server uses Transaction Capture Consumer to keep transaction integrity while publishing to kafka topic. CICS Business events are mechanism for declaratively emitting event from CICS routines.