페이지

2024년 9월 21일 토요일

Factors impacting Kafka performance Other

1. Make sure you have enough file handles opened on your servers, as Kafka opens 3 file descriptor for each topic-partition-segment that lives on the Broker.


2. Make suer you use Java 8


3. You may want to tune the GC implementation: (see in the resources)


4. Set Kafka quatas in order to prevent unexpected spikes in usage


Factors impacting Kafka performance Operating System(OS)

1. Use Linux or Solaris, running production Kafka clusters on Windows is not recommended.


2. Increase the file descriptor limits (at least 100,000 as a starting point)

https://unix.stackexchange.com/questions/8945/how-can-i-increase-open-files-limit-for-all-processes/8949#8949


3. Make sure only Kafka is running on your Operating System. Anything else will just slow the machine down.

Factors impacting Kafka performance CPU

1. CPU is usually not a performance bottle neck in Kafka because Kafka does not parse any messages, but can become one in some situations


2. If you have SSL enabled, Kafka has to encrypt and decrypt every payload, which adds load on the CPU


3. Compression can be CPU bound if you force Kafka to do it. Instead, if you send compressed data, make sure your producer and consumers are the ones doing the compression work (that's the default setting anyway)


4. Make sure you monitor Garbage Collection over time to ensure the pauses are not too long

2024년 9월 13일 금요일

Factors impacting Kafka performance RAM

1. Kafka has amazing performance thanks to the page cache which utilizes your RAM


2. Understanding RAM in Kafka means understanding two parts:

    - The Java HEAP from the Kafka process

    - The rest of the RAM used by the OS page cache


3. Let's understand how both of those should be sized


4. Overall, your Kafka production machines should have at least 8GB of RAM to them(the more the better - it's common to have 16GB or 32GB per broker)

* Java Heap


5. When you launch Kafka, you specify Kafka Heap Options(KAFKA_HEAP_OPTS environment variable)


6. I recommend to assign a MAX amount (-Xms) of 4GB to get started to the kafka heap:


7. export KAFKA_HEAP_OPTS="-Xmx4g"


8. Don't set -Xms (starting heap size):

    - Ket heap grow over time

    - Monitor the heap over time to see if you need to increases Xmx


9. Kafka should keep a low heap usage over time, and heap should increase only if you have more partitiions in your broker


* OS Page Cache


10. The remaining RAM will be used automatically for the Linux OS Page Cache.


11. This is used to buffer data to the disk and this is what gives Kafka an amazing performance


12. You don't have to specify anything!


13. Any un-used memory will automatically be leveraged by the Linux Operating System and assign memory to the page cache


14. Note: Make sure swapping is disabled for Kafka entirely

vm.swappiness=0 or vm.swappiness=1(default is 60 on Linux)

Factors impacting Kafka performance Network

1. Latency is key in Kafka

    - Ensure your Kafka instances are your Zookeeper instances are geographically close!!!

    - Do not put one broker in Europe and the other broker in the US

    - Having two brokers live on the same rack is good for performance, but a big risk if the rack goes down.


2. Network bandwidth is key in Kafka

    - Network will be your bottleneck.

    - Make sure you have enough bandwidth to handle manyu connections, and TCP requests.

    - Make sure your network is high performance


3. Monitor network usage to understand when it becomes a bottleneck

2024년 9월 12일 목요일

Factors impacting Kafka performance

1. Reads are done sequentially (as in not randomly), therefore make sure you should a disk type that corresponds well to the requirements

2. Format your drives as XFS(easiest, no tuning required)

3. If read/write throughput is your bottleneck

    - it is possible to mount multiple disks in parallel for Kafka

    - The config is log.dirs=/disk1/kafka-logs, /disk2/kafka-logs,/disk3/kafka-logs...

4. Kafka performance is constant with regards to the amount of data stored in Kafka.

    - Make sure you expire data fast enough (default is one week)

    - Make sure you monitor disk performance

Hands On: Demonstrating Kafka resiliency

1. We create a topic with 3 as a replication factor

2. We will start producing data consistently to a topic

3. We will read that data from a topic

4. We will kill one Kafka instance

5. We will kill another Kafka instance

6. We will kill the last Kafka instance



#!/bin/bash

# create a topic with replication factor of 3
bin/kafka-topics.sh --zookeeper zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/kafka --create --topic fourth_topic --replication-factor 3 --partitions 3

# generate 10KB of random data
base64 /dev/urandom | head -c 10000 | egrep -ao "\w" | tr -d '\n' > file10KB.txt

# in a new shell: start a continuous random producer
bin/kafka-producer-perf-test.sh --topic fourth_topic --num-records 10000 --throughput 10 --payload-file file10KB.txt --producer-props acks=1 bootstrap.servers=kafka1:9092,kafka2:9092,kafka3:9092 --payload-delimiter A

# in a new shell: start a consumer
bin/kafka-console-consumer.sh --bootstrap-server kafka1:9092,kafka2:9092,kafka3:9092 --topic fourth_topic

# kill one kafka server - all should be fine
# kill another kafka server -
# kill the last server

# start back the servers one by one


# ERRORS:
# PRODUCER:
# org.apache.kafka.common.errors.TimeoutException: Expiring 137 record(s) for fourth_topic-0: 30024 ms has passed since batch creation plus linger time
# [2017-05-25 10:24:23,784] WARN Error while fetching metadata with correlation id 1086 : {fourth_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
# org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.
# [2017-05-25 10:24:23,850] WARN Received unknown topic or partition error in produce request on partition fourth_topic-0. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)
# [2017-05-25 10:24:23,914] WARN Error while fetching metadata with correlation id 1092 : {fourth_topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
# org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.

# CONSUMER:
# [2017-05-25 10:24:23,798] WARN Error while fetching metadata with correlation id 3431 : {fourth_topic=INVALID_REPLICATION_FACTOR} (org.apache.kafka.clients.NetworkClient)
# [2017-05-25 10:24:24,081] WARN Auto-commit of offsets {fourth_topic-0=OffsetAndMetadata{offset=3948, metadata=''}} failed for group console-consumer-25246: Offset commit failed with a retriable exception. You should retry committing offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
# [2017-05-25 10:24:24,231] WARN Received unknown topic or partition error in fetch for partition fourth_topic-0. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.consumer.internals.Fetcher)