페이지

2024년 8월 16일 금요일

Role of Zookeeper in Kafka

1. Brokers registration, with heartbeats mechanism to keep the list current

2. Maintaining a list of topics alongside

    - Their configuration (partitions, replication factor, additional configurations..)

    - The list of ISRs (in sync replicas) for partitions

3.  Performing leader elections in case some brokers go down

4. Storing the Kafka cluster id(randomly generated at 1st startup of cluster)

5. Storing ACLs (Access Control Lists) if security is enabled

    - Topics

    - Consumer Groups

    - Users

6. Quatas configuration if enabled

7. (deprecated) Used by old consumer API to store consumer offserts



2024년 8월 14일 수요일

What is Zookeeper

 1. Zookeeper provides multiple features for distributed applications:

    - Distributed configuration management

    - Self election / consensus building

    - Coordination and locks.

    - Key value store

2. Zookeeper is used in many distributed systems, such as Hadoop, Kafka, etc...

3. It's an Apache Project that's proven to be very stable and hasn't had major release in many years

4. 3.4.x is the stable channel

    3.5.x has been in development for manyu years, and it is still in beta


5. Zookeeper internal data structure is like a tree

- Each node is called a zNode

- Each zNode has a path

- zNode can be persistent or ephemeral

- Each zNode can store data

- No renaming of zNode

- Each zNode can be WATCHed for changes



Note on IPs and DNS *mostly AWS

 1. Your Zookeeper & Kafka must know their hostname and/or IP

2. These are not supposed to change over time, even after a reboot, otherwise your setup will be broken.

3. Options:

    - Use an Elastic Public IP == constat public IP

        you will be able to access your cluster from the outside (e.g. laptop)

    - Use a secondary ENI == constant private IP (this course)

        you will not be able to access your cluster from the outside

        you will only be able to access the cluster from within your network

    - Use DNS names (private or public) == no need to keep fixed IP

        public means you can access instances from outside

        private means you can't access instance from outside





Kafka Cluster Architecture

 

1. We will co-locate Zookeeper and Kafka for cost-saving purposes(not recommend for production deployments)


2. Knowledge is still applicable to production deployments








Zookeeper Quorum Architecture

 


1. High level Information about Zookeeper:

    - Distributed Key value store

    - Has voting mechanisms

    - Used by many big data tools


2. Absolutely necessary to have functional and up Zookeeper Quorum to run Apache Kafka.

    






Final Setup Architecture

 








Kafka Cluster setup is hard

1. To properly setup a Lafka Cluster, you need:

    - Zookeeper Cluster

    - Kafka Cluster

    - Proper replication factor

    - Multi Availability Zones Setup

    - Proper configurations(there are over 140 possible configs!)