페이지

2024년 6월 21일 금요일

Producers: Message keys

- Producers can choose to send a key with the message (string, number, etc..)

- If key = null, data is sent reound robin (broker 101 then 102 then 103...)

- If a key is sent, then all messages for that key will always go to the same partition

- A key is basically sent if you need message ordering for a specific field(ex:truck_id)

     (Advanced: we get this guarantee thanks to key hashing, which depends on the number of partitions)







Producers

- Producers write data to topics(which is made of partitions)

- Producers automatically know to which broker and partition to write to

- In case of Broker failures, Producers will automatically recover


- Producers can choose to receive achnowledgment of data writes:

    - acks=0: Producer won't wait for acknowledgment(possible data loss)

    - acks=1: Producer will wait for leader acknowledgment(limited data loss)

    - acks=all: Leader + replicas acknowledgment (no data loss)








Concept of Leader for a Partition

- At an time onlyu ONE broker can be a leader for a given partition

- Only that leader can receive and serve data for a partition

- The other brokers will synchronize tghe data

- Therefore each partition has one leader and multiple ISR(in-sync replica)






Topic replication factor

- Topics should have a replication factor > 1 (usually between 2 and 3)

- This way if a broker is down, another broker can seve the data

- Example: Topic-A with 2 partitions and replication factor of 2


- Example: we lost Broker 102

- Result: Broker 101 and 103 can still serve the data










Brokers and topics

 - Example of Topic-A with 3 partitions

 - Example of Topic-B with 2 partitions


- Note: Data is distributed and Broker 103 doesn't have any Topic B data




Brokers

- A Kafka cluster is composed of multiple brokers(servers)

- Each broker is identified with its ID(integer)

- Each broker contains certain topic partitions

- After connectiong to any broker (called a bootstrap broker), you will be connected to the entire cluster

- A good number to get started is 3 brokers, but some big clusters have over  100 brokers

- In these examples we choose to number brokers starting at 100(arbitrary)








Topics, partitions and offsets

 - Topics: a particular stream of data

    1) Similar to a table in a database(without all the constraints)

    2) You can have as many topics as you want

    3) A topic is identified by its name

- Topics are split in partitions

    1) Each partition is orderd

    2) Each message within a partition gets an incremental id, called offet

- Offset only have a meaning for a specific partistion.

    - E.g. offset 3 in partition 0 doesn't represent the same data as offset 3 in partition 1

- Order is guaranteed only within a partition (not across partitions)

- Data is kept only for a limited time (default is noe week)

- Once the data is written to a partition, it can't be changed (immutability)

- Data is assigned randomly to a partition unless a key is provided (more on this later)