페이지

2024년 8월 31일 토요일

Hands On: One Machine Setup

1. Augment the file handle limits

2. Launch Kafka on one machine

3. Setup Kafka as a Service


#!/bin/bash

# Add file limits configs - allow to open 100,000 file descriptors
echo "* hard nofile 100000
* soft nofile 100000" | sudo tee --append /etc/security/limits.conf

# reboot for the file limit to be taken into account
sudo reboot
sudo service zookeeper start
sudo chown -R ubuntu:ubuntu /data/kafka

# edit kafka configuration
rm config/server.properties
nano config/server.properties

# launch kafka
bin/kafka-server-start.sh config/server.properties

# Install Kafka boot scripts
sudo nano /etc/init.d/kafka
sudo chmod +x /etc/init.d/kafka
sudo chown root:root /etc/init.d/kafka
# you can safely ignore the warning
sudo update-rc.d kafka defaults

# start kafka
sudo service kafka start
# verify it's working
nc -vz localhost 9092
# look at the server logs
cat /home/ubuntu/kafka/logs/server.log


# create a topic
bin/kafka-topics.sh --zookeeper zookeeper1:2181/kafka --create --topic first_topic --replication-factor 1 --partitions 3
# produce data to the topic
bin/kafka-console-producer.sh --broker-list kafka1:9092 --topic first_topic
hi
hello
(exit)
# read that data
bin/kafka-console-consumer.sh --bootstrap-server kafka1:9092 --topic first_topic --from-beginning
# list kafka topics
bin/kafka-topics.sh --zookeeper zookeeper1:2181/kafka --list



############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
# change your.host.name by your machine's IP or hostname
advertised.listeners=PLAINTEXT://kafka1:9092

# Switch to enable topic deletion or not, default value is false
delete.topic.enable=true

############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
log.dirs=/data/kafka

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=8
# we will have 3 brokers so the default replication factor should be 2 or 3
default.replication.factor=3
# number of ISR to have in order to minimize data loss
min.insync.replicas=1

############################# Log Retention Policy #############################

# The minimum age of a log file to be eligible for deletion due to age
# this will delete data after a week
log.retention.hours=168

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/kafka

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################## Other ##################################
# I recommend you set this to false in production.
# We'll keep it as true for the course
auto.create.topics.enable=true


#!/bin/bash
#/etc/init.d/kafka
DAEMON_PATH=/home/ubuntu/kafka/bin
DAEMON_NAME=kafka
# Check that networking is up.
#[ ${NETWORKING} = "no" ] && exit 0

PATH=$PATH:$DAEMON_PATH

# See how we were called.
case "$1" in
start)
# Start daemon.
pid=`ps ax | grep -i 'kafka.Kafka' | grep -v grep | awk '{print $1}'`
if [ -n "$pid" ]
then
echo "Kafka is already running"
else
echo "Starting $DAEMON_NAME"
$DAEMON_PATH/kafka-server-start.sh -daemon /home/ubuntu/kafka/config/server.properties
fi
;;
stop)
echo "Shutting down $DAEMON_NAME"
$DAEMON_PATH/kafka-server-stop.sh
;;
restart)
$0 stop
sleep 2
$0 start
;;
status)
pid=`ps ax | grep -i 'kafka.Kafka' | grep -v grep | awk '{print $1}'`
if [ -n "$pid" ]
then
echo "Kafka is Running as PID: $pid"
else
echo "Kafka is not Running"
fi
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit 1
esac

exit 0





2024년 8월 30일 금요일

Hands On: AWS Setup

1. Setup network security to allow Kafka ports (9092)

2. Create and Attach EBS volumes to EC2 Instances

(to have a separate drive for Kafka operations)

3. Format the newly attached EBS volumes as XFS

(recommended file system for Kafka as per documentation - requires less tuning)

4. Make suer the volume stays mapped on reboot

5. Apply on all machines

#!/bin/bash

# execute commands as root
sudo su

# Attach the EBS volume in the console, then
# view available disks
lsblk

# we verify the disk is empty - should return "data"
file -s /dev/xvdf

# Note on Kafka: it's better to format volumes as xfs:
# https://kafka.apache.org/documentation/#filesystems
# Install packages to mount as xfs
apt-get install -y xfsprogs

# create a partition
fdisk /dev/xvdf

# format as xfs
mkfs.xfs -f /dev/xvdf

# create kafka directory
mkdir /data/kafka
# mount volume
mount -t xfs /dev/xvdf /data/kafka
# add permissions to kafka directory
chown -R ubuntu:ubuntu /data/kafka
# check it's working
df -h /data/kafka

# EBS Automount On Reboot
cp /etc/fstab /etc/fstab.bak # backup
echo '/dev/xvdf /data/kafka xfs defaults 0 0' >> /etc/fstab

# reboot to test actions
reboot
sudo service zookeeper start





Kafka Configuration

1. Configuring Kafka in production is an ART.

2. It requires really good understanding of:

    - Operation Systems Architecture

    - Servers Architecture

    - Distributed Computing

    - CPU operations

    - Network performance

    - Disk I/O

    - RAM and Heap size

    - Page cache

    - Kafka and Zookeeper


3. There are over 140 configuration parameters available for Kafka

    - Only a few parameters are needed to get started

    - Importance is classified between mandatory, high, medium and low.

    - I have deployed Kafka where more than 40 parameters are in use


4. You will never get the optimal configuration right for your needs the first time

5. Configuring Kafka is an iterative process: behavior changes over time based on usage and monitoring, so should your configuration

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
# change your.host.name by your machine's IP or hostname
advertised.listeners=PLAINTEXT://kafka1:9092

# Switch to enable topic deletion or not, default value is false
delete.topic.enable=true

############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
log.dirs=/data/kafka

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=8
# we will have 3 brokers so the default replication factor should be 2 or 3
default.replication.factor=3
# number of ISR to have in order to minimize data loss
min.insync.replicas=2

############################# Log Retention Policy #############################

# The minimum age of a log file to be eligible for deletion due to age
# this will delete data after a week
log.retention.hours=168

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/kafka

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################## Other ##################################
# I recommend you set this to false in production.
# We'll keep it as true for the course
auto.create.topics.enable=true



Kafka Cluster Size Discussions 100 brokers

 


1. In the case of 100 brokers:

    - Your cluster is fully distributed and can handle tremendous volumes, even using commodity hardware.

    - Zookeeper may be under pressure because of many open connections, so you need to increase the Zookeeper instance performance

    - Cluster management is a full time job (make suer no broker acts weirdly)

    - It is not recommended to have a replication factor of 4 or more, and this would incur network communications within the brokers. Leave it as 3

    - Scale horizontally only when a bottleneck is reached (network, disk i/o, cpu, ram)

2024년 8월 27일 화요일

Kafka Cluster Size Discussions 3 brokers

 


1. In the case of 3 brokers:

    - N-1 brokers can be down, if N is your default topic replication factor.

    - Ex: If N = 3, then two brokers can be down

    - Producer and consumer requests are spread out between your different machines

    - Data is spread out between brokers, which means less disk space is used per broker.

    - You can have a cluster



Kafka Cluster Size Discussions 1 broker

 


1. In the case of only 1 broker:

    - If the broker is restarted, the Kafka cluster is down

    - The maximum replication factor for topics is 1.

    - All producer and consumer requests go to the same unique broker

    - You can only scale vertically (by increasing the instance size and restarting)


2. It's extremely high risk, and only useful for development purposes


2024년 8월 20일 화요일

Kafka Basics

1. Brokers holds topic partitions

2. Brokers receive and serve data

3. Brokers are the unit of parallelism of a Kafka cluster

4. Brokers are the essence of the "distributed" aspect of Kafka


Management tools for Zookeeper

1. You can build your own using the 4 Letter Words

2. Or use one of the following:

    - Netflix Exhibitor (heavily recommended but tedious setup):

    https://github.com/soabase/exhibitor

    - Zookeeper UI (web):

    https://github.com/DeemOpen/zkui

    - Zookeeper GUI(desktop) - Windows binaries available:

    https://github.com/echoma/zkui

    - ZK-web(not updated since July 2016):

    https://github.com/qiuziafei/zk-web

    - ZooNavigator(promising new project):

    https://github.com/elkozmon/zoonavigator


zoonavigator-docker-compose.yml

version: '2'

services:
# https://github.com/elkozmon/zoonavigator
zoonavigator:
image: elkozmon/zoonavigator:latest
container_name: zoonavigator
network_mode: host
environment:
HTTP_PORT: 8001
restart: always

nano zoonavigator-docker-compose.yml
# Make sure port 8001 is opened on the instance security group

# copy the zookeeper/zoonavigator-docker-compose.yml file
# run it
docker-compose -f zoonavigator-docker-compose.yml up -d

docker ps

curl localhost:8001



Hands-On: Web Tools AWS EC2 Machine

1. Start machine with Ubuntu

2. Install Docker and required packages

3. Open up ports

4. Try a Docker hello world


#!/bin/bash
sudo apt-get update

# Install packages to allow apt to use a repository over HTTPS:
sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
software-properties-common

# Add Docker’s official GPG key:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

# set up the stable repository.
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"

# install docker
sudo apt-get update
sudo apt-get install -y docker-ce docker-compose

# give ubuntu permissions to execute docker
sudo usermod -aG docker $(whoami)
# log out
exit
# log back in

# make sure docker is working
docker run hello-world

# Add hosts entries (mocking DNS) - put relevant IPs here
echo "172.31.9.1 kafka1
172.31.9.1 zookeeper1
172.31.19.230 kafka2
172.31.19.230 zookeeper2
172.31.35.20 kafka3
172.31.35.20 zookeeper3" | sudo tee --append /etc/hosts



Zookeeper in AWS

1. If you use private IPs, you may have the following error:

http://stackoverflow.com/questions/309409881/zookeeper-error-cannot-open-channel-to-x-at-election-address

2. Use Netflix Exhibitor

3. Reserve your instances if you know you will use them for over a year(decreased cost)

4. Or you can use Amazon EMR to provision a Zookeeper cluster, but you have less control


Zookeeper performance

1. Latency is key for Zookeeper, and any of these variables will affect it:

2. Fast disk(SSD)

3. No RAM Swap

4. Separate disk for snapshots and logs

5. High performance network (low latency)

6. Reasonable number of Zookeeper Servers

7. Islation of Zookeeper instances from other processes

Using Zookeeper Understanding the files created

1. Understanding the files created by Zookeeper on the filesystem

2. Besides myid, the rest of the files should remain untouched. They are managed by Zookeeper

3. /

    - myid: file representing the server id. That's how Zookeeper knows its identity

    - version-2/: folder that holds the zookeeper data

        - AcceptEpoch and CurrentEpoch: internal to Zookeeper

        - Log.X: Zookeeper data files



2024년 8월 18일 일요일

Hands-On: Using Zookeeper Four letter words

1. Examples of using 4LW (Four Letter Words)


2. https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sczkCommands


# https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_zkCommands
# conf
# New in 3.3.0: Print details about serving configuration.
echo "conf" | nc localhost 2181


# cons
# New in 3.3.0: List full connection/session details for all clients connected to this server. Includes information on numbers of packets received/sent, session id, operation latencies, last operation performed, etc...
echo "cons" | nc localhost 2181


# dump
# Lists the outstanding sessions and ephemeral nodes. This only works on the leader.
echo "dump" | nc localhost 2181



# envi
# Print details about serving environment
echo "envi" | nc localhost 2181


# ruok
# Tests if server is running in a non-error state. The server will respond with imok if it is running. Otherwise it will not respond at all.
echo "ruok" | nc localhost 2181


# srvr
# New in 3.3.0: Lists full details for the server.
echo "srvr" | nc localhost 2181

# stat
# Lists brief details for the server and connected clients.
echo "stat" | nc localhost 2181

# wchs
# New in 3.3.0: Lists brief information on watches for the server.
echo "wchs" | nc localhost 2181

#wchc
#New in 3.3.0: Lists detailed information on watches for the server, by session. This outputs a list of sessions(connections) with associated watches (paths). Note, depending on the number of watches this operation may be expensive (ie impact server performance), use it carefully.
echo "wchc" | nc localhost 2181

# wchp
# New in 3.3.0: Lists detailed information on watches for the server, by path. This outputs a list of paths (znodes) with associated sessions. Note, depending on the number of watches this operation may be expensive (ie impact server performance), use it carefully.
echo "wchp" | nc localhost 2181

# mntr
# New in 3.4.0: Outputs a list of variables that could be used for monitoring the health of the cluster.
echo "mntr" | nc localhost 2181




Hands On: Quorum Setup

1. Create an AMI (image) from  the existing machine

2. Create other 2 machines, and launch Zookeeper on them

3. Test that the Quorum is running and working



nano /home/ubuntu/kafka/config/zookeeper.properties
# the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
dataDir=/data/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0
# the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
tickTime=2000
# The number of ticks that the initial synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# zoo servers
# these hostnames such as `zookeeper-1` come from the /etc/hosts file
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888
bin/zookeeper-server-start.sh config/zookeeper.properties





sudo mkdir -p /data/zookeeper
sudo chown -R ubuntu:ubuntu /data/
# declare the server's identity
echo "1" > /data/zookeeper/myid
# edit the zookeeper settings
rm /home/ubuntu/kafka/config/zookeeper.properties
nano /home/ubuntu/kafka/config/zookeeper.properties
# restart the zookeeper service
sudo service zookeeper stop
sudo service zookeeper start
# observe the logs - need to do this on every machine
cat /home/ubuntu/kafka/logs/zookeeper.out | head -100
nc -vz localhost 2181
nc -vz localhost 2888
nc -vz localhost 3888
echo "ruok" | nc localhost 2181 ; echo
echo "stat" | nc localhost 2181 ; echo
bin/zookeeper-shell.sh localhost:2181
# not happy
ls /







2024년 8월 17일 토요일

Hands-On: Using Zookeeper Command Line Interface

1. Create nodes, sub nodes, etc...

2. Get / Set data for a node

3. Watch a node

4. Delete a node


# start zookeeper
sudo service zookeeper start
# verify it's started
nc -vz localhost 2181

bin/zookeeper-shell.sh localhost 2181
# display help
help
# display root
ls /
create /my-node "foo"


ls /
get /my-node
set /my-node "new data"
create /my-node/deeper-node "bar"
ls /
ls /my-node
ls /my-node/deeper-node
get /my-node/deeper-node
rmr /my-node/deeper-node
rmr /my-node
ls /
# create a watcher
create /node-to-watch ""
get /node-to-watch true
set /node-to-watch "new data"
set /node-to-watch "whatever"


Hands On:Single Machine Setup

1. SSH into our machine

2. Install some necessary (java) and helpful packages on the machine

3. Disable RAM Swap

4. Add hosts mapping from hostname to public ips to /etc/hosts

5. Download & Configure Zookeeper on the machine

6. Launch Zookeeper on the machine to test

7. Setup Zookeeper as a service on the machine


sudo apt-get update && \
sudo apt-get -y install wget ca-certificates zip net-tools vim nano tar netcat
sudo apt-get -y install openjdk-8-jdk
java -version
sudo sysctl vm.swappiness=1
echo 'vm.swappiness=1' | sudo tee --append /etc/sysctl.conf
cat /etc/hosts
echo "172.31.9.1 kafka1
172.31.9.1 zookeeper1
172.31.19.230 kafka2
172.31.19.230 zookeeper2
172.31.35.20 kafka3
172.31.35.20 zookeeper3" | sudo tee --append /etc/hosts
ping kafka1
ping kafka2
wget https://archive.apache.org/dist/kafka/0.10.2.1/kafka_2.12-0.10.2.1.tgz
tar -xvzf kafka_2.12-0.10.2.1.tgz
rm kafka_2.12-0.10.2.1.tgz
mv kafka_2.12-0.10.2.1 kafka
cd kafka/
cat config/zookeeper.properties
bin/zookeeper-server-start.sh config/zookeeper.properties
# Testing Zookeeper install
# Start Zookeeper in the background
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
bin/zookeeper-shell.sh localhost:2181
ls /
# demonstrate the use of a 4 letter word
echo "ruok" | nc localhost 2181 ; echo


sudo nano /etc/init.d/zookeeper
#!/bin/sh
#
# zookeeper Start/Stop zookeeper
#
# chkconfig: - 99 10
# description: Standard script to start and stop zookeeper

DAEMON_PATH=/home/ubuntu/kafka/bin
DAEMON_NAME=zookeeper

PATH=$PATH:$DAEMON_PATH

# See how we were called.
case "$1" in
start)
# Start daemon.
pid=`ps ax | grep -i 'org.apache.zookeeper' | grep -v grep | awk '{print $1}'`
if [ -n "$pid" ]
then
echo "Zookeeper is already running";
else
echo "Starting $DAEMON_NAME";
$DAEMON_PATH/zookeeper-server-start.sh -daemon /home/ubuntu/kafka/config/zookeeper.properties
fi
;;
stop)
echo "Shutting down $DAEMON_NAME";
$DAEMON_PATH/zookeeper-server-stop.sh
;;
restart)
$0 stop
sleep 2
$0 start
;;
status)
pid=`ps ax | grep -i 'org.apache.zookeeper' | grep -v grep | awk '{print $1}'`
if [ -n "$pid" ]
then
echo "Zookeeper is Running as PID: $pid"
else
echo "Zookeeper is not Running"
fi
;;
*)
echo "Usage: $0 {start|stop|restart|status}"
exit 1
esac

exit 0
sudo chmod +x /etc/init.d/zookeeper
sudo chown root:root /etc/init.d/zookeeper
sudo update-rc.d zookeeper defaults
sudo service zookeeper stop
nc -vz localhost 2181
sudo service zookeeper start
sudo service zookeeper status
nc -vz localhost 2181
echo "ruok" | nc localhost 2181 ; echo
cat logs/zookeeper.out

2024년 8월 16일 금요일

How to SSH into our machine

1. Windows: Install Putty

https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html

2. Mac/Linux: You have OpenSSH






Hands On:AWS Setup

1. Create an AWS Account
2. Setup network security to allow Zookeeper ports (2181, 2888, 3888)
3. Setup network security to allow myu IP only
4. Create I EC2 machines Ubuntu image t2.medium (4 GB RAM)
5. Reserve 3 private IPs for our machines










Zookeeper configuration

1. Zookeeper configuration can be very tricky to optimize and really depends on how your Kafka cluster is formed, as well as your network environment


2. We are going to set the most common settings for Zookeeper and discuss some more advance settings

# the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
dataDir=/data/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0
# the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
tickTime=2000
# The number of ticks that the initial synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# zoo servers
# these hostnames such as `zookeeper-1` come from the /etc/hosts file
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

Zookeeper Architecture Quorum sizing

1. Zookeeper needs to have a strict majority of servers up to form a strict majority when votes happen

2. Therefore Zookeepr quorums have 1,3,5,7,9,(2N+1) servers

3. This allows for 0,1,2,3,4,N server to go down

















Role of Zookeeper in Kafka

1. Brokers registration, with heartbeats mechanism to keep the list current

2. Maintaining a list of topics alongside

    - Their configuration (partitions, replication factor, additional configurations..)

    - The list of ISRs (in sync replicas) for partitions

3.  Performing leader elections in case some brokers go down

4. Storing the Kafka cluster id(randomly generated at 1st startup of cluster)

5. Storing ACLs (Access Control Lists) if security is enabled

    - Topics

    - Consumer Groups

    - Users

6. Quatas configuration if enabled

7. (deprecated) Used by old consumer API to store consumer offserts



2024년 8월 14일 수요일

What is Zookeeper

 1. Zookeeper provides multiple features for distributed applications:

    - Distributed configuration management

    - Self election / consensus building

    - Coordination and locks.

    - Key value store

2. Zookeeper is used in many distributed systems, such as Hadoop, Kafka, etc...

3. It's an Apache Project that's proven to be very stable and hasn't had major release in many years

4. 3.4.x is the stable channel

    3.5.x has been in development for manyu years, and it is still in beta


5. Zookeeper internal data structure is like a tree

- Each node is called a zNode

- Each zNode has a path

- zNode can be persistent or ephemeral

- Each zNode can store data

- No renaming of zNode

- Each zNode can be WATCHed for changes



Note on IPs and DNS *mostly AWS

 1. Your Zookeeper & Kafka must know their hostname and/or IP

2. These are not supposed to change over time, even after a reboot, otherwise your setup will be broken.

3. Options:

    - Use an Elastic Public IP == constat public IP

        you will be able to access your cluster from the outside (e.g. laptop)

    - Use a secondary ENI == constant private IP (this course)

        you will not be able to access your cluster from the outside

        you will only be able to access the cluster from within your network

    - Use DNS names (private or public) == no need to keep fixed IP

        public means you can access instances from outside

        private means you can't access instance from outside





Kafka Cluster Architecture

 

1. We will co-locate Zookeeper and Kafka for cost-saving purposes(not recommend for production deployments)


2. Knowledge is still applicable to production deployments








Zookeeper Quorum Architecture

 


1. High level Information about Zookeeper:

    - Distributed Key value store

    - Has voting mechanisms

    - Used by many big data tools


2. Absolutely necessary to have functional and up Zookeeper Quorum to run Apache Kafka.