페이지

2024년 6월 21일 금요일

Delivery semantics for consumers

- Consuemers choose when to commit offsets.

- There are 3 delivery sematics:


- At most once:

    - offsets are committed as soon as the message is received.

    - If the processing goes wrong, the message will be lost(it won't be read again).

- At least once(usually preferred):

    - offsets are committed after the message is processed.

    - If the processing goes wrong, the message will be read again.

    - This can result in duplicate processing of messages. Make sure your processing is idempotent (i.e. processing again the messages won't impact your systems)

- Exactly once:

    - Can be achieved for Kafka => Kafka workflows using Kafka Streams API

    - For Kafka => External System workflows, use an idempotent consumer.



Consumer Offsets

- Kafka stores the offsets at which a consumer group has been reading

- The offsets committed live in a Kafka topic named __consumer_offsets

- When a consumer in a group has processed data received from Kafka, it should be committing tghe offsets

- If a consumer dies, it will be a able to read back from where it left off thanks to the committed consumer offsets!










Consumer Groups What if too many consumers?

 - If you have more consumers than partitions, som consumers will be inactive






Consumer Groups

- Consumers read data in consumer groups

- Each consumer within a group reads from exclusive partitions

- If you have more consumers than partitions, some consumers will be inactive

      - Note: Consumers will automatically use a GroupCoordinator and a ConsumerCoordinator to assign a consumers to a partition


Consumers

- Consumers read data from a topic(identified by name)

- Consumers know which broker to read from

- In case of broker failures, consumers know how to recover

- Data is read in order within each partitions




Producers: Message keys

- Producers can choose to send a key with the message (string, number, etc..)

- If key = null, data is sent reound robin (broker 101 then 102 then 103...)

- If a key is sent, then all messages for that key will always go to the same partition

- A key is basically sent if you need message ordering for a specific field(ex:truck_id)

     (Advanced: we get this guarantee thanks to key hashing, which depends on the number of partitions)







Producers

- Producers write data to topics(which is made of partitions)

- Producers automatically know to which broker and partition to write to

- In case of Broker failures, Producers will automatically recover


- Producers can choose to receive achnowledgment of data writes:

    - acks=0: Producer won't wait for acknowledgment(possible data loss)

    - acks=1: Producer will wait for leader acknowledgment(limited data loss)

    - acks=all: Leader + replicas acknowledgment (no data loss)