- Every Kafka broker is also called a "bootstrap server"
- That means that you only need to connect to one broker, and you will be connected to the entire cluster.
- Each broker knows about all brokers, topics and partitions(metadata)
- Every Kafka broker is also called a "bootstrap server"
- That means that you only need to connect to one broker, and you will be connected to the entire cluster.
- Each broker knows about all brokers, topics and partitions(metadata)
- Consuemers choose when to commit offsets.
- There are 3 delivery sematics:
- At most once:
- offsets are committed as soon as the message is received.
- If the processing goes wrong, the message will be lost(it won't be read again).
- At least once(usually preferred):
- offsets are committed after the message is processed.
- If the processing goes wrong, the message will be read again.
- This can result in duplicate processing of messages. Make sure your processing is idempotent (i.e. processing again the messages won't impact your systems)
- Exactly once:
- Can be achieved for Kafka => Kafka workflows using Kafka Streams API
- For Kafka => External System workflows, use an idempotent consumer.
- Kafka stores the offsets at which a consumer group has been reading
- The offsets committed live in a Kafka topic named __consumer_offsets
- When a consumer in a group has processed data received from Kafka, it should be committing tghe offsets
- If a consumer dies, it will be a able to read back from where it left off thanks to the committed consumer offsets!
- Consumers read data in consumer groups
- Each consumer within a group reads from exclusive partitions
- If you have more consumers than partitions, some consumers will be inactive
- Note: Consumers will automatically use a GroupCoordinator and a ConsumerCoordinator to assign a consumers to a partition- Producers can choose to send a key with the message (string, number, etc..)
- If key = null, data is sent reound robin (broker 101 then 102 then 103...)
- If a key is sent, then all messages for that key will always go to the same partition
- A key is basically sent if you need message ordering for a specific field(ex:truck_id)
(Advanced: we get this guarantee thanks to key hashing, which depends on the number of partitions)