1. Zookeeper needs to have a strict majority of servers up to form a strict majority when votes happen
2. Therefore Zookeepr quorums have 1,3,5,7,9,(2N+1) servers
3. This allows for 0,1,2,3,4,N server to go down
1. Zookeeper needs to have a strict majority of servers up to form a strict majority when votes happen
2. Therefore Zookeepr quorums have 1,3,5,7,9,(2N+1) servers
3. This allows for 0,1,2,3,4,N server to go down
1. Brokers registration, with heartbeats mechanism to keep the list current
2. Maintaining a list of topics alongside
- Their configuration (partitions, replication factor, additional configurations..)
- The list of ISRs (in sync replicas) for partitions
3. Performing leader elections in case some brokers go down
4. Storing the Kafka cluster id(randomly generated at 1st startup of cluster)
5. Storing ACLs (Access Control Lists) if security is enabled
- Topics
- Consumer Groups
- Users
6. Quatas configuration if enabled
7. (deprecated) Used by old consumer API to store consumer offserts
1. Zookeeper provides multiple features for distributed applications:
- Distributed configuration management
- Self election / consensus building
- Coordination and locks.
- Key value store
2. Zookeeper is used in many distributed systems, such as Hadoop, Kafka, etc...
3. It's an Apache Project that's proven to be very stable and hasn't had major release in many years
4. 3.4.x is the stable channel
3.5.x has been in development for manyu years, and it is still in beta
- Each node is called a zNode
- Each zNode has a path
- zNode can be persistent or ephemeral
- Each zNode can store data
- No renaming of zNode
- Each zNode can be WATCHed for changes
1. Your Zookeeper & Kafka must know their hostname and/or IP
2. These are not supposed to change over time, even after a reboot, otherwise your setup will be broken.
3. Options:
- Use an Elastic Public IP == constat public IP
you will be able to access your cluster from the outside (e.g. laptop)
- Use a secondary ENI == constant private IP (this course)
you will not be able to access your cluster from the outside
you will only be able to access the cluster from within your network
- Use DNS names (private or public) == no need to keep fixed IP
public means you can access instances from outside
private means you can't access instance from outside
1. We will co-locate Zookeeper and Kafka for cost-saving purposes(not recommend for production deployments)
2. Knowledge is still applicable to production deployments
1. High level Information about Zookeeper:
- Distributed Key value store
- Has voting mechanisms
- Used by many big data tools
2. Absolutely necessary to have functional and up Zookeeper Quorum to run Apache Kafka.