페이지

2024년 8월 16일 금요일

Zookeeper configuration

1. Zookeeper configuration can be very tricky to optimize and really depends on how your Kafka cluster is formed, as well as your network environment


2. We are going to set the most common settings for Zookeeper and discuss some more advance settings

# the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
dataDir=/data/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0
# the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
tickTime=2000
# The number of ticks that the initial synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# zoo servers
# these hostnames such as `zookeeper-1` come from the /etc/hosts file
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

Zookeeper Architecture Quorum sizing

1. Zookeeper needs to have a strict majority of servers up to form a strict majority when votes happen

2. Therefore Zookeepr quorums have 1,3,5,7,9,(2N+1) servers

3. This allows for 0,1,2,3,4,N server to go down

















Role of Zookeeper in Kafka

1. Brokers registration, with heartbeats mechanism to keep the list current

2. Maintaining a list of topics alongside

    - Their configuration (partitions, replication factor, additional configurations..)

    - The list of ISRs (in sync replicas) for partitions

3.  Performing leader elections in case some brokers go down

4. Storing the Kafka cluster id(randomly generated at 1st startup of cluster)

5. Storing ACLs (Access Control Lists) if security is enabled

    - Topics

    - Consumer Groups

    - Users

6. Quatas configuration if enabled

7. (deprecated) Used by old consumer API to store consumer offserts



2024년 8월 14일 수요일

What is Zookeeper

 1. Zookeeper provides multiple features for distributed applications:

    - Distributed configuration management

    - Self election / consensus building

    - Coordination and locks.

    - Key value store

2. Zookeeper is used in many distributed systems, such as Hadoop, Kafka, etc...

3. It's an Apache Project that's proven to be very stable and hasn't had major release in many years

4. 3.4.x is the stable channel

    3.5.x has been in development for manyu years, and it is still in beta


5. Zookeeper internal data structure is like a tree

- Each node is called a zNode

- Each zNode has a path

- zNode can be persistent or ephemeral

- Each zNode can store data

- No renaming of zNode

- Each zNode can be WATCHed for changes



Note on IPs and DNS *mostly AWS

 1. Your Zookeeper & Kafka must know their hostname and/or IP

2. These are not supposed to change over time, even after a reboot, otherwise your setup will be broken.

3. Options:

    - Use an Elastic Public IP == constat public IP

        you will be able to access your cluster from the outside (e.g. laptop)

    - Use a secondary ENI == constant private IP (this course)

        you will not be able to access your cluster from the outside

        you will only be able to access the cluster from within your network

    - Use DNS names (private or public) == no need to keep fixed IP

        public means you can access instances from outside

        private means you can't access instance from outside





Kafka Cluster Architecture

 

1. We will co-locate Zookeeper and Kafka for cost-saving purposes(not recommend for production deployments)


2. Knowledge is still applicable to production deployments








Zookeeper Quorum Architecture

 


1. High level Information about Zookeeper:

    - Distributed Key value store

    - Has voting mechanisms

    - Used by many big data tools


2. Absolutely necessary to have functional and up Zookeeper Quorum to run Apache Kafka.