CDC in notes



Recently, at HelloFresh I was tasked with prototyping a solution which involved we have a Kafka message published. While the quickest could have been to write a producer and consumer perhaps in the same process. Given its a new code base that was heavily abstracted, which we didn't understand much and crucial component to core HelloFresh business. Why would we do this to ourselves?

We settled on a non intrusive approach. The idea is to create a new denormalised table which would have all the data required for the event produced. Database triggers would populate this table.

Next we introduced Debezium. Debezium is a software to capture change data of the database tables. Captured data is transformed and published to the final kafka topic using Benthos.


Debezium relies on postgres's replication mechanism. Postgresql during it's normal operations, generates WAL files, they are binary files with entries know as xlog.