Skip to main contentIBM Garage Event-Driven Reference Architecture

Fit for purpose

In this note we want to list some of the main criteria to consider and assess during an event-driven architecture establishment work and as part of a continuous application governance. This is not fully exhaustive, but give good foundations for analysis and study. Fit for purpose practices should be done under a bigger program about application development governance and data governance. We can look at least to the following major subjects:

Cloud native applications

With the adoption of cloud native and microservice applications (the 12 factors app), the following needs to be addressed:

  • Responsiveness with elastic scaling and resilience to failure. Which leads to adopt the ’reactive manifesto’ and consider messaging as a way to communicate between apps. Elastic also may lead to multi cloud deployment practice.
  • Address data sharing using a push model to improve decoupling, and performance. Instead of having each service using REST end points to pull the data from other services, each service pushes the change to their main business entity to a event backbone. Each future service which needs those data, pulls from the messaging system.
  • Adopting common patterns like command query responsibility seggregation to help implementing complex queries, joining different business entities owned by different microservices, event sourcing, transactional outbox and SAGA for long running transaction.
  • Addressing data eventual consistency to propagate change to other components versus ACID transaction.
  • Support “always-on” approach with the deployment to multiple data centers (at least three) being active/active and being able to propagate data in all data centers.

Supporting all or part of those requirements will lead to the adoption of event-driven microservices and architecture.

Modern data pipeline

As new business applications need to react to events in real time, the adoption of event backbone is really part of the IT toolbox. Modern IT architecture encompasses the adoption of new data hub, where all the data about a ‘customer’, for example, is accessible in one event backbone. Therefore, it is natural to assess the data movement strategy and assess how to offload some of those ETL jobs running at night, by adopting real time data ingestion.

We detailed the new architecture in this modern data lake article, so from a fit for purpose point of view, we need to assess the scope of existing ETL jobs, and refector to streaming logic that can be incorporated into different logs/ topics. With Event Backbone like Kafka, any consumer can join the data log consumption at any point of time, within the retention period. By moving the ETL logic to a streaming application, we do not need to wait for the next morning to get important metrics.

MQ Versus Kafka

We already addressed the differences between Queueing and Streaming in this chapter.

Now in term of technologies we can quickly highlight the following:

Kafka characteristics

  • Keep message for long period of time, messages are not deleted once consumed
  • Suited for high volume and low latency processing
  • Support pub/sub model only
  • Messages are ordered in a topic/partition
  • Stores and replicate events published to a topic, remove on expire period or on disk space constraint
  • Messages are removed from file system independent of applications
  • Topic can have multiple partitions to make consumer processing parallel.
  • Not supporting two phase commits / XA transaction, but message can be produced with local transaction
  • Multi-region architecture requires data replication across regions
  • Cluster can support multi availability zones
  • But also support extended cluster to go over different regions if those regions have low latency network
  • Scales horizontally, by adding more nodes
  • non-standard API but rich library to support the main programming language.
  • But support also HTTP bridge or proxy to get message sent via HTTP
  • Kafka is easy to setup with kubernetes deployment with real operator, is more difficult to manage with bare metal deployment.
  • Cluster and topics definition can be managed with Gitops and automatically instantiated in new k8s cluster
  • Consumer and producers are build and deployed with simple CI/CD pipeline

MQ characteristics

  • Best suited for point-to-point communication
  • Support horizontal scaling and high volume processing
  • Participate to XA transactionals
  • Exactly one delivery
  • Integrate with Mainframe
  • Support JMS for JEE applications
  • Messages are removed after consumption, they stayed persisted until consumed by all subscribers
  • Strong coupling with subscribers
  • Support MQ brokers cluster with leader and followers
  • Support replication between brokers
  • Support message priority
  • Easily containerized and managed with Kubernetes operators.
  • Support AMQP protocol

Direct product feature comparison

Kafka is a pub/sub engine with streams and connectorsMQ is a queue,or pub/sub engine
All topics are persistent Queues and topics can be persistent or non persistent
All subscribers are durableSubscribers can be durable or non durable
Adding brokers to requires little work (changing a configuration file) Adding QMGRs requires some work (Add the QMGRs to the cluster, add cluster channels. Queues and Topics need to be added to the cluster.)
Topics can be spread across brokers (partitions) with a command Queues and topics can be spread across a cluster by adding them to clustered QMGRs
Producers and Consumers are aware of changes made to the clusterAll MQ clients require a CCDT file to know of changes if not using a gateway QMGR
Can have n number of replication partitionsCan have 2 replicas (RDQM) of a QMGR, Multi Instance QMGRs
Simple load balancing Load balancing can be simple or more complex using weights and affinity
Can reread messages Cannot reread messages that have been already processed
All clients connect using a single connection methodMQ has Channels which allow different clients to connect, each having the ability to have different security requirements
Data Streams processing built in, using Kafka topic for efficiencyStream processing is not built in, but using third party libraries, like MicroProfile Reactive Messaging, ReactiveX, etc.
Has connection security, authentication security, and ACLs (read/write to Topic)Has connection security, channel security, authentication security, message security/encryption, ACLs for each Object, third party plugins (Channel Exits)
Built on Java, so can run on any platform that support Java 8+Latest native on AIX, IBM i, Linux systems, Solaris, Windows, z/OS, run as Container
Monitoring by using statistics provided by Kafka CLI, open source tools, PrometheusMonitoring using PCF API, MQ Explorer, MQ CLI (runmqsc), Third Party Tools (Tivoli, CA APM, Help Systems, Open Source, etc)

Migrating from MQ to Kafka

When the real conditions as listed above are met, architects may assess if it makes sense to migrate application using MQ to Kafka. Most of the time the investment is not justified. Modern MQ supports the same DevOps and deployment pattern as cloud native application.

JEE or mainframe applications use MQ in transaction to avoid duplicate messages or loss of messages. Supporting exactly once delivery in Kafka needs some configuration and participation of producer and consumers: far more complex to implement.

Once we have setup data streams, we need technology to support real-time analytics and complex event processing. Historically, analytics performed on static data was done using batch reporting techniques. However, if insights have to be derived in real-time, event-driven architectures help to analyse and look for patterns within events.

Apache Flink (2016) is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. It is considered to be superior to Apache Spark and Hadoop. It supports batch and graph processing and complex event processing. The major stream processing features offered by Flink are:

  • Support for event time and out of order streams: use event time for consistent results
  • Consistency, fault tolerance, and high availability: guarantees consistent state updates in the presence of failures and consistent data movement between selected sources and sinks
  • Low latency and high throughput: tune the latency-throughput trade off, making the system suitable for both high-throughput data ingestion and transformations, as well as ultra low latency (millisecond range) applications.
  • Expressive and easy-to-use APIs in Scala and Java: map, reduce, join, with window, split,… Easy to implement the business logic using Function.
  • Support for sessions and unaligned windows: Flink completely decouples windowing from fault tolerance, allowing for richer forms of windows, such as sessions.
  • Connectors and integration points: Kafka, Kinesis, Queue, Database, Devices…
  • Developer productivity and operational simplicity: Start in IDE to develop and deploy and deploy to Kubernetes, Yarn, Mesos or containerized
  • Support batch processing
  • Includes Complex Event Processing capabilities

Here is simple diagram of Flink architecture from the Flink web site:

Flink components

See this technology summary and samples.