Using Debezium with Event Streams on OpenShift
This guide will show you how to use Debezium with Event Streams. It will do this by demonstrating streaming PostgreSQL changes to an Event Streams topic.
This guide assumes you have the following:
- Installation of Event Streams on OpenShift
- OpenShift CLI & Helm CLI on your client machine
- Docker installed on your client machine
Download the Kafka Connect Configuration from Event Streams & Configure Event Streams
This section will guide you through downloading the Kafka Connect configuration generated by Event Streams.
- Log in to the IBM Cloud Private Web Console & navigate to your Event Streams Helm Release from Workloads -> Helm Releases.
- Launch the Event Streams UI - Inside of the Event Streams Helm Release view, click the Launch button and select
admin-ui-https
. - From the Event Streams UI, Navigate to Toolbox -> Set up a Kafka Connect environment and follow the instructions on screen. These instructions should guide you through creating topics for Kafka Connect, generating an API key and provide you with your Kafka Connect configuration.
Build the Kafka Connect Image
This section will guide you through building a Kafka Connect Image that includes the Debezium PostgreSQL connector. This section will use the package downloaded in the previous section.
-
Unzip the
kafkaconnect.zip
file that was downloaded$ unzip kafkaconnect.zip -d kafkaconnect
-
Delete the zipped
kafkaconnect
package$ rm kafkaconnect.zip
-
Download the Debezium PostgreSQL Connector (v0.10.0.CR1) package from Maven Central
$ wget https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/0.10.0.CR1/debezium-connector-postgres-0.10.0.CR1-plugin.tar.gz
-
Untar the Debezium PostgreSQL Connector into the
kafkaconnect/connectors
directory$ tar -xvzf debezium-connector-postgres-0.10.0.CR1-plugin.tar.gz -C kafkaconnect/connectors
-
Delete the Debezium PostgreSQL Connector tarball
$ rm debezium-connector-postgres-0.10.0.CR1-plugin.tar.gz
-
Build the Kafka Connect image with Docker
$ docker build -t kafka-connect-debezium-postgres:0.0.1 kafkaconnect
Upload the Kafka Connect Image to OpenShift
This section will guide you through uploading the Kafka Connect image to OpenShift. This will require SSH access to a cluster node. For alternative methods, please refer to the OpenShift documentation.
-
Package the Docker image built in the last section as a tarball
$ docker save kafka-connect-debezium-postgres:0.0.1 --output kafka-connect-debezium-postgres-0-0-1.tar
-
Upload the image tarball to one of the cluster nodes using SCP. For example a cluster node at
192.168.10.2
using userroot
$ scp kafka-connect-debezium-postgres-0-0-1.tar root@192.168.10.2:/root/
-
SSH into the cluster node
$ ssh root@192.168.10.2
-
Load the uploaded image tarball into the local Docker repository
$ docker load --input kafka-connect-debezium-postgres-0-0-1.tar
-
Login to your OpenShift cluster with the
oc
command$ oc login
-
Login to the OpenShift Docker registry, this assumes that the registry is installed at the default location
docker-registry.default.svc.cluster.local:5000
$ docker login -u any_value -p $(oc whoami -t) docker-registry.default.svc.cluster.local:5000
-
Tag the Kafka Connect image with an appropriate registry location. Replace
your-namespace-here
with the namespace where you will deploy Kafka Connect, if deploying to another namespace$ docker tag kafka-connect-debezium-postgres:0.0.1 docker-registry.default.svc.cluster.local:5000/your-namespace-here/kafka-connect-debezium-postgres:0.0.1
-
Push the image up to the OpenShift Docker registry, replacing
your-namespace-here
with the namespace you specified in the last step$ docker push docker-registry.default.svc.cluster.local:5000/your-namespace-here/kafka-connect-debezium-postgres:0.0.1
Deploy Kafka Connect on OpenShift
This section will guide you through deploying Kafka Connect on OpenShift using the image that you uploaded in the last section. Kafka Connect will be exposed as a NodePort service on port 30500
of your cluster nodes.
-
Return to the workspace that contains the
kafkaconnect
directory that was downloaded in the first section, and login to your OpenShift cluster$ oc login
-
Upload
connect-distributed.properties
as a Secret in the OpenShift namespace where you will deploy Kafka Connect (replacingyour-namespace-here
)$ oc create secret generic --namespace your-namespace-here --from-file kafkaconnect/config/connect-distributed.properties connect-distributed-config
-
Upload
connect-log4j.properties
as a ConfigMap in the OpenShift namespace where you will deploy Kafka Connect (replacingyour-namespace-here
)$ oc create configmap --namespace your-namespace-here --from-file kafkaconnect/config/connect-log4j.properties connect-log4j-config
-
Create a file in your workspace named
kafka-connect-deploy.yaml
with the following contents. Replace the value ofspec.template.spec.containers.0.image
with the image name that you uploaded in the last section# Deployment apiVersion: apps/v1 kind: Deployment metadata: name: kafkaconnect-deploy labels: app: kafkaconnect spec: replicas: 1 selector: matchLabels: app: kafkaconnect template: metadata: labels: app: kafkaconnect spec: securityContext: runAsNonRoot: true runAsUser: 5000 containers: - name: kafkaconnect-container image: docker-registry.default.svc.cluster.local:5000/your-namespace-here/kafka-connect-debezium-postgres:0.0.1 readinessProbe: httpGet: path: / port: 8083 livenessProbe: httpGet: path: / port: 8083 ports: - containerPort: 8083 volumeMounts: - name: connect-config mountPath: /opt/kafka/config/connect-distributed.properties subPath: connect-distributed.properties - name: connect-log4j mountPath: /opt/kafka/config/connect-log4j.properties subPath: connect-log4j.properties volumes: - name: connect-config secret: secretName: connect-distributed-config - name: connect-log4j configMap: name: connect-log4j-config --- # Service apiVersion: v1 kind: Service metadata: name: kafkaconnect-service labels: app: kafkaconnect-service spec: type: NodePort ports: - name: kafkaconnect protocol: TCP port: 8083 nodePort: 30500 selector: app: kafkaconnect
-
Deploy Kafka Connect (replacing
your-namespace-here
)$ oc create --namespace your-namespace-here -f kafka-connect-deploy.yaml
Configure or Install PostgreSQL
This section will give you the minimum required configuration for PostgreSQL by Debezium. If you do not have PostgreSQL, you can proceed to the Installation section.
Configure PostgreSQL
If you already have PostgreSQL installed, you must enable the logical decoding feature of PostgreSQL (v. >=9.4) and enable a logical decoding output plugin. PostgreSQL v.10+ includes the pgoutput
logical decoding output plugin by default. If you are on a lower version, you must first install either decoderbufs or wal2json. More information can be found on the Debezium website here
After you have an acceptable local decoding output plugin installed, refer to the Debezium documentation to complete the configuration of your PostgreSQL server https://debezium.io/documentation/reference/0.10/connectors/postgresql.html#server-configuration
Install PostgreSQL
If you do not have PostgreSQL installed, you may use a Helm Chart to get a working PostgreSQL instance on your cluster.
This section will guide you through the installation of PostgreSQL using the Stable Helm Chart. You will also add some additional configuration for PostgreSQL necessary for Debezium to read the PostgreSQL transaction log. PostgreSQL will be available on port 30600
of your cluster nodes. The superuser will be postgres
, password is passw0rd
and an initial database name of postgres
.
-
Create a PostgreSQL configuration necessary for Debezium. Create a file in your workspace named
extended.conf
with the following contents:# extended.conf wal_level = logical max_wal_senders = 1 max_replication_slots = 1
-
Create a ConfigMap from the
extended.conf
file with the following command (replacingyour-namespace-here
)$ oc create configmap --namespace your-namepace-here --from-file=extended.conf postgresql-config
-
Install PostgreSQL using the Stable Helm Chart with the following command (replacing
your-namespace-here
). For more configuration options, refer to the PostgreSQL Helm Chart$ helm install --name postgres --namespace your-namespace-here --set extendedConfConfigMap=postgresql-config --set service.type=NodePort --set service.nodePort=30600 --set postgresqlPassword=passw0rd --set securityContext.runAsUser=5000 --set volumePermissions.securityContext.runAsUser=5000 stable/postgresql --tls
-
Finally, connect to your PostgreSQL instance with your favorite PostgreSQL client and create a table with data. Note that the
psql
client is installed on the postgres container. You mayexec
into the postgres container and use psql from there.$ oc exec --namespace your-namespace-here -it postgres-postgresql-0 -- /bin/sh
$ psql login --user postgres
Add Sample Data to PostgreSQL
-
Open a shell in the Postgres container.
$ kubectl exec --namespace your-namespace-here -it postgres-postgresql-0 -- /bin/sh
-
Login to Postgres with the following command, entering the password
passw0rd
when prompted (this is the password set from thehelm install
command in the previous section).$ psql --user postgres
-
Create a table named
containers
.CREATE TABLE containers(containerid VARCHAR(30) NOT NULL,type VARCHAR(20),status VARCHAR(20),brand VARCHAR(50),capacity DECIMAL,CREATIONDATE TIMESTAMP DEFAULT CURRENT_TIMESTAMP,UPDATEDATE TIMESTAMP DEFAULT CURRENT_TIMESTAMP,PRIMARY KEY (containerid));
-
Insert data into the table.
INSERT INTO containers (containerid, type, status, brand, capacity) VALUES ('C01','Reefer','Operational','containerbrand',20), ('C02','Dry','Operational','containerbrand',20), ('C03','Dry','Operational','containerbrand',40), ('C04','FlatRack','Operational','containerbrand',40), ('C05','OpenTop','Operational','containerbrand',40), ('C06','OpenSide','Operational','containerbrand',40), ('C07','Tunnel','Operational','containerbrand',40), ('C08','Tank','Operational','containerbrand',40), ('C09','Thermal','Operational','containerbrand',20);
Configure a Debezium Kafka Connect Connector
After you have a configured PostgreSQL instance, you may then configure a new Debezium Kafka Connect connector for PostgreSQL. Kafka Connect connectors are configured by making a request to the Kafka Connect API endpoint.
The following shows a configuration of a PostgreSQL instance at postgres-postgresql.your-namespace-here.svc.cluster.local
on port 5432
with username postgres
and password passw0rd
. The database is postgres
with table public.containers
. In this example, we are using the pgoutput
logical decoding output plugin. Also, we are using the cURL
HTTP client to make a call to the Kafka Connect API at http://192.168.10.2:30500
since 192.168.10.2
is one our cluster nodes and the Kafka Connect API is exposed as a NodePort
on port 30500
.
curl -X POST \ http://192.168.10.2:30500/connectors \ -H 'Content-Type: application/json' \ -d '{ "name": "containers-connector", "config": { "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "plugin.name": "pgoutput", "database.hostname": "postgres-postgresql.your-namespace-here.svc.cluster.local", "database.port": "5432", "database.user": "postgres", "database.password": "passw0rd", "database.dbname": "postgres", "database.server.name": "postgres", "table.whitelist": "public.containers" } }'
You should then be able to return to the Event Streams UI, navigating to the "Topics" section. You can then click on "postgres.public.containers" (or whatever your table name is) to see your newly created topic. Then, click on "Messages" to view the messages that Debezium has written to the topic.