Kafka is used primarily to decouple services from each other and reduce the pain of integration
If you have 4 source systems and 6 target systems you need to write 24 integrations. These different systems may use different:
languages
different protocols (TCP, HTTP, REST, FTP, JDBC, e.t.c. ....)
different data formats (Binary, CSV, JSON
data schema
Also if you are using a request response system where by a request to a certain service calls another service which in turn calls another service. If the services update databases it becomes hard to roll back a transaction across multiple machines. This makes error handling problematic.
Kafka allows you to decouple your source and target streams. So all source and target systems link to Kafka.
To solve the problem of an error been thrown when a service calls a chain of services, the mindset of a Kafka message queue is that if part of the workflow is blocked because a service is down, it can be coded in such a way that the service resumes processing once it is back online. This is not possible with Restful Services, gRPC or graphQL
Apache Kafka came from LinkedIn and is now mainly maintained by the company Confluence under the Apache Stewardship.
The architecture is:
distributed
resilient
fault tolerant
It scales horizontally and can scale to:
hundreds of brokers
millions of message per second
Higher performance
real time latency of less than 10ms
Messaging System
Activity Tracking
Gathering metrics
Stream Processing (has a Kafka Stream API for that)
Decoupling System Dependencies
For big data integrations with Spark, Flink, Storm, Hadoop
Netflix uses Kafka to apply recommendations in real time
Uber uses Kafka to gather user, taxi and drip data in real time to compute and forecast demand and computer surge pricing in real time
LinkedIn uses Kafka to prevent spam, collect user interactions to make better connection recommendations in real time.
Back: Overview of Tech
Page Author: JD