페이지

2024년 8월 30일 금요일

Hands On: AWS Setup

1. Setup network security to allow Kafka ports (9092)

2. Create and Attach EBS volumes to EC2 Instances

(to have a separate drive for Kafka operations)

3. Format the newly attached EBS volumes as XFS

(recommended file system for Kafka as per documentation - requires less tuning)

4. Make suer the volume stays mapped on reboot

5. Apply on all machines

#!/bin/bash

# execute commands as root
sudo su

# Attach the EBS volume in the console, then
# view available disks
lsblk

# we verify the disk is empty - should return "data"
file -s /dev/xvdf

# Note on Kafka: it's better to format volumes as xfs:
# https://kafka.apache.org/documentation/#filesystems
# Install packages to mount as xfs
apt-get install -y xfsprogs

# create a partition
fdisk /dev/xvdf

# format as xfs
mkfs.xfs -f /dev/xvdf

# create kafka directory
mkdir /data/kafka
# mount volume
mount -t xfs /dev/xvdf /data/kafka
# add permissions to kafka directory
chown -R ubuntu:ubuntu /data/kafka
# check it's working
df -h /data/kafka

# EBS Automount On Reboot
cp /etc/fstab /etc/fstab.bak # backup
echo '/dev/xvdf /data/kafka xfs defaults 0 0' >> /etc/fstab

# reboot to test actions
reboot
sudo service zookeeper start





Kafka Configuration

1. Configuring Kafka in production is an ART.

2. It requires really good understanding of:

    - Operation Systems Architecture

    - Servers Architecture

    - Distributed Computing

    - CPU operations

    - Network performance

    - Disk I/O

    - RAM and Heap size

    - Page cache

    - Kafka and Zookeeper


3. There are over 140 configuration parameters available for Kafka

    - Only a few parameters are needed to get started

    - Importance is classified between mandatory, high, medium and low.

    - I have deployed Kafka where more than 40 parameters are in use


4. You will never get the optimal configuration right for your needs the first time

5. Configuring Kafka is an iterative process: behavior changes over time based on usage and monitoring, so should your configuration

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
# change your.host.name by your machine's IP or hostname
advertised.listeners=PLAINTEXT://kafka1:9092

# Switch to enable topic deletion or not, default value is false
delete.topic.enable=true

############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
log.dirs=/data/kafka

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=8
# we will have 3 brokers so the default replication factor should be 2 or 3
default.replication.factor=3
# number of ISR to have in order to minimize data loss
min.insync.replicas=2

############################# Log Retention Policy #############################

# The minimum age of a log file to be eligible for deletion due to age
# this will delete data after a week
log.retention.hours=168

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/kafka

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################## Other ##################################
# I recommend you set this to false in production.
# We'll keep it as true for the course
auto.create.topics.enable=true



Kafka Cluster Size Discussions 100 brokers

 


1. In the case of 100 brokers:

    - Your cluster is fully distributed and can handle tremendous volumes, even using commodity hardware.

    - Zookeeper may be under pressure because of many open connections, so you need to increase the Zookeeper instance performance

    - Cluster management is a full time job (make suer no broker acts weirdly)

    - It is not recommended to have a replication factor of 4 or more, and this would incur network communications within the brokers. Leave it as 3

    - Scale horizontally only when a bottleneck is reached (network, disk i/o, cpu, ram)

2024년 8월 27일 화요일

Kafka Cluster Size Discussions 3 brokers

 


1. In the case of 3 brokers:

    - N-1 brokers can be down, if N is your default topic replication factor.

    - Ex: If N = 3, then two brokers can be down

    - Producer and consumer requests are spread out between your different machines

    - Data is spread out between brokers, which means less disk space is used per broker.

    - You can have a cluster



Kafka Cluster Size Discussions 1 broker

 


1. In the case of only 1 broker:

    - If the broker is restarted, the Kafka cluster is down

    - The maximum replication factor for topics is 1.

    - All producer and consumer requests go to the same unique broker

    - You can only scale vertically (by increasing the instance size and restarting)


2. It's extremely high risk, and only useful for development purposes


2024년 8월 20일 화요일

Kafka Basics

1. Brokers holds topic partitions

2. Brokers receive and serve data

3. Brokers are the unit of parallelism of a Kafka cluster

4. Brokers are the essence of the "distributed" aspect of Kafka


Management tools for Zookeeper

1. You can build your own using the 4 Letter Words

2. Or use one of the following:

    - Netflix Exhibitor (heavily recommended but tedious setup):

    https://github.com/soabase/exhibitor

    - Zookeeper UI (web):

    https://github.com/DeemOpen/zkui

    - Zookeeper GUI(desktop) - Windows binaries available:

    https://github.com/echoma/zkui

    - ZK-web(not updated since July 2016):

    https://github.com/qiuziafei/zk-web

    - ZooNavigator(promising new project):

    https://github.com/elkozmon/zoonavigator


zoonavigator-docker-compose.yml

version: '2'

services:
# https://github.com/elkozmon/zoonavigator
zoonavigator:
image: elkozmon/zoonavigator:latest
container_name: zoonavigator
network_mode: host
environment:
HTTP_PORT: 8001
restart: always

nano zoonavigator-docker-compose.yml
# Make sure port 8001 is opened on the instance security group

# copy the zookeeper/zoonavigator-docker-compose.yml file
# run it
docker-compose -f zoonavigator-docker-compose.yml up -d

docker ps

curl localhost:8001