- Consumers read data from a topic(identified by name)
- Consumers know which broker to read from
- In case of broker failures, consumers know how to recover
- Data is read in order within each partitions
- Producers can choose to send a key with the message (string, number, etc..)
- If key = null, data is sent reound robin (broker 101 then 102 then 103...)
- If a key is sent, then all messages for that key will always go to the same partition
- A key is basically sent if you need message ordering for a specific field(ex:truck_id)
(Advanced: we get this guarantee thanks to key hashing, which depends on the number of partitions)- Producers write data to topics(which is made of partitions)
- Producers automatically know to which broker and partition to write to
- In case of Broker failures, Producers will automatically recover
- Producers can choose to receive achnowledgment of data writes:
- acks=0: Producer won't wait for acknowledgment(possible data loss)
- acks=1: Producer will wait for leader acknowledgment(limited data loss)
- acks=all: Leader + replicas acknowledgment (no data loss)
- At an time onlyu ONE broker can be a leader for a given partition
- Only that leader can receive and serve data for a partition
- The other brokers will synchronize tghe data
- Therefore each partition has one leader and multiple ISR(in-sync replica)
- Topics should have a replication factor > 1 (usually between 2 and 3)
- This way if a broker is down, another broker can seve the data
- Example: Topic-A with 2 partitions and replication factor of 2
- Example: we lost Broker 102
- Result: Broker 101 and 103 can still serve the data
- Example of Topic-A with 3 partitions
- Example of Topic-B with 2 partitions
- Note: Data is distributed and Broker 103 doesn't have any Topic B data
- A Kafka cluster is composed of multiple brokers(servers)
- Each broker is identified with its ID(integer)
- Each broker contains certain topic partitions
- After connectiong to any broker (called a bootstrap broker), you will be connected to the entire cluster
- A good number to get started is 3 brokers, but some big clusters have over 100 brokers
- In these examples we choose to number brokers starting at 100(arbitrary)