페이지

2024년 6월 21일 금요일

Theory Roundup We've looked at all the Kafka concepts

 







Kafka Guarantees

- Messages are appended to a topic-partition in the order they are sent

- Consumers read messages in the order stored in a topic-partition

- With a replication factor of N, producers and consumers can tolerate up to N-I brokers being down

- This is why a replication factor of 3 is a good idea:

    - Allows for on broker to be taken down for maintenance

    - Allows for another broker to be taken down unexpectedly

- As long as the number of partitions remains constant for a topic(no new partitions), the same key will always go to the same partition

Zookeeper

- Zookeeper manages brokers (keeps a list of them)

- Zookeeper helps in performing leader election for partitions

- Zookeeper sends notifications to Kafka in case of changes (e.g. new topic, broker dies, broker comes up, delete topics, etc...)

- Kafka can't work without Zookeeper

- Zookeeper by design operates with an odd number of servers (3, 5, 7)

- Zookeeper has a leader (handle writes) the rest of the servers are followers (handle reads)

- (Zokeeper does NOT store consumer offsets with Kafka > v0.10)






Kafka Broker Discovery

- Every Kafka broker is also called a "bootstrap server"

- That means that you only need to connect to one broker, and you will be connected to the entire cluster.

- Each broker knows about all brokers, topics and partitions(metadata)


Delivery semantics for consumers

- Consuemers choose when to commit offsets.

- There are 3 delivery sematics:


- At most once:

    - offsets are committed as soon as the message is received.

    - If the processing goes wrong, the message will be lost(it won't be read again).

- At least once(usually preferred):

    - offsets are committed after the message is processed.

    - If the processing goes wrong, the message will be read again.

    - This can result in duplicate processing of messages. Make sure your processing is idempotent (i.e. processing again the messages won't impact your systems)

- Exactly once:

    - Can be achieved for Kafka => Kafka workflows using Kafka Streams API

    - For Kafka => External System workflows, use an idempotent consumer.



Consumer Offsets

- Kafka stores the offsets at which a consumer group has been reading

- The offsets committed live in a Kafka topic named __consumer_offsets

- When a consumer in a group has processed data received from Kafka, it should be committing tghe offsets

- If a consumer dies, it will be a able to read back from where it left off thanks to the committed consumer offsets!










Consumer Groups What if too many consumers?

 - If you have more consumers than partitions, som consumers will be inactive