페이지

2024년 9월 22일 일요일

Advanced Kafka Configuration Important parameters to be aware of

1. auto.create.topics.enable=true => set to false in production

2. background.threads=10 => increase if you have a good CPU

3. delete.topic.enable=false => your choice

4. log.flush.interval.messages => don't ever change. Let your OS do it

5. log.retention.hours=168 => Set in regards to your requirements

6. message.max.bytes=1000012 => increase if you need more than 1MB

7. min.insync.replicas=1 => set to 2 fi you want to be extra safe

8. num.ios.threads=8 => ++if your network io is a bottleneck

9. num.network.threads=3 => ++if your network is a bottleneck

10. num.recovery.threads.per.data.dir=1 => set to number of disks

11. num.replica.fetchers=1 => increase if your replicas are lagging

12. offsets.retention.minutes=1440 => after 24 hours you lose offsets

13. unclean.leader.election.enable=true => false if you don't want data loss

14. zookeeper.session.timeout.ms=6000 => increase if you timeout often

15. broker.rack=null => set your to availability zone in AWS

16. default.replication.factor=1 => set to 2 or 3 in a kafka cluster

17. num.partitions=1 => set from 3 to 6 in your cluster

18. quota.producer.default=10485760 => set quota to 10MBs

19. quota.consumer.default=10485760 => set quota to 10MBs


2024년 9월 21일 토요일

How to change a Kafka configuration

1. Change the configuration on every file server.properties

2. Proceed to a rolling restart of all the brokers

#!/bin/bash

# let's apply a new setting to our server.properties
pwd
# make sure you're in the /home/ubuntu/kafka directory
cat config/server.properties
echo "unclean.leader.election.enable=false" >> config/server.properties
cat config/server.properties

# look at the logs - what was the value before?
cat logs/server.log | grep unclean.leader
# stop the broker
sudo service kafka stop
# restart the broker
sudo service kafka start
# look at the logs - what is the value after?
cat logs/server.log | grep unclean.leader

# operate on the three brokers

Running Kafka on AWS in Production

1. Separate your instances between different availability zones


2. Use stl EBS volumes for the best price / performance ratio


3. Use r4.xlarge or m4.2xlarge if you're using EBS(these instances are EBS optimized). You can use something smaller but performance may degrade


4. Setup DNS names for your brokers / fixed IPs so that your clients aren't affected if you recycle your EC2 instances

Factors impacting Kafka performance Other

1. Make sure you have enough file handles opened on your servers, as Kafka opens 3 file descriptor for each topic-partition-segment that lives on the Broker.


2. Make suer you use Java 8


3. You may want to tune the GC implementation: (see in the resources)


4. Set Kafka quatas in order to prevent unexpected spikes in usage


Factors impacting Kafka performance Operating System(OS)

1. Use Linux or Solaris, running production Kafka clusters on Windows is not recommended.


2. Increase the file descriptor limits (at least 100,000 as a starting point)

https://unix.stackexchange.com/questions/8945/how-can-i-increase-open-files-limit-for-all-processes/8949#8949


3. Make sure only Kafka is running on your Operating System. Anything else will just slow the machine down.

Factors impacting Kafka performance CPU

1. CPU is usually not a performance bottle neck in Kafka because Kafka does not parse any messages, but can become one in some situations


2. If you have SSL enabled, Kafka has to encrypt and decrypt every payload, which adds load on the CPU


3. Compression can be CPU bound if you force Kafka to do it. Instead, if you send compressed data, make sure your producer and consumers are the ones doing the compression work (that's the default setting anyway)


4. Make sure you monitor Garbage Collection over time to ensure the pauses are not too long

2024년 9월 13일 금요일

Factors impacting Kafka performance RAM

1. Kafka has amazing performance thanks to the page cache which utilizes your RAM


2. Understanding RAM in Kafka means understanding two parts:

    - The Java HEAP from the Kafka process

    - The rest of the RAM used by the OS page cache


3. Let's understand how both of those should be sized


4. Overall, your Kafka production machines should have at least 8GB of RAM to them(the more the better - it's common to have 16GB or 32GB per broker)

* Java Heap


5. When you launch Kafka, you specify Kafka Heap Options(KAFKA_HEAP_OPTS environment variable)


6. I recommend to assign a MAX amount (-Xms) of 4GB to get started to the kafka heap:


7. export KAFKA_HEAP_OPTS="-Xmx4g"


8. Don't set -Xms (starting heap size):

    - Ket heap grow over time

    - Monitor the heap over time to see if you need to increases Xmx


9. Kafka should keep a low heap usage over time, and heap should increase only if you have more partitiions in your broker


* OS Page Cache


10. The remaining RAM will be used automatically for the Linux OS Page Cache.


11. This is used to buffer data to the disk and this is what gives Kafka an amazing performance


12. You don't have to specify anything!


13. Any un-used memory will automatically be leveraged by the Linux Operating System and assign memory to the page cache


14. Note: Make sure swapping is disabled for Kafka entirely

vm.swappiness=0 or vm.swappiness=1(default is 60 on Linux)