1. Here's the problem: the Producer can introduce duplicate messages in Kakka due to network errors
- In Kafka >= 0.11, you can define a "idempotent producer" which won't introduce duplicates on network error2024년 7월 5일 금요일
Idempotent Producer
Producer retries
1. In case of transient failures, developers are expected to handle exceptions, otherwise the data will be lost.
2. Example of transient failure:
- NotEnoughReplicasException
3. There is a "retries" setting
- defaults to 0
- You can increase to a high number, ex Integer.MAX_VALUE
- In case of retries, by default there is a chance that messages will be sent out of order (if a batch has failed to be sent).
- If yolu rely on key-based ordering, that can be an issue.
- For this, you can set the setting while controls how many produce requests can be made in parallel: max.in.flight.requests.per.connection
- Default: 5
- Set it to I if you need to ensure ordering (may impact throughput)
- In Kafka >= 1.0.0, there's a better solution!
Producers Acks Deep Dive acks = all (replicas acks)
1. Leader + Replicas ack requested
2. Added latency and safety
3. No data loss if enough replicas
- Necessary setting if you don't want to lose data
- Acks=all must be used in conjunction with min.insync.replicas.
- min.insync.replicas can be set at the broker or topic level (override).
- min.insync.replicas=2 implies that at least 2 brokers that are ISR(including leader) must responsd that they have the data.
- That means if you use replication.factor=3, min.insync=2, acks=all, you can only tolerate I broker going down, otherwise the producer will receive an exception on send.
Producers Acks Deep Dive acks = 1 (leader acks)
1. Leader response is requested, but replication is not a guarantee
(happens in the background)
2. If an ack is not received, the produceder may retry
3. If the leader broker goes offline but replicas haven't replicated the data yet, we have a data loss.Producers Acks Deep Dive acks = 0 (no acks)
1. No response is requested
2. If the broker goes offline or an exception happens, we won't know and will lose data
3. Useful for data where it's okay to potentially lose messages:2024년 6월 22일 토요일
Kafka Command Line Interface 101
kafka-topics
Start Kafka
https://kafka.apache.org/ download
mkdir
C:\kafka_2.12-3.7.0\data\zookeeper
setting path: bin
C:\kafka_2.12-3.7.0\config\zookeeper.properties edit
dataDir=C:/kafka_2.12-3.7.0/data/zookeeper
C:\kafka_2.12-3.7.0 zookeeper-server-start.bat config\zookeeper.properties
C:\kafka_2.12-3.7.0>bin\windows\zookeeper-server-start.bat config\zookeeper.properties
[2024-06-23 08:53:35,580] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)C:\kafka_2.12-3.7.0\config\server.properties edit
log.dirs=C:/kafka_2.12-3.7.0/data/kafka
C:\kafka_2.12-3.7.0>.\bin\windows\kafka-server-start.bat config\server.properties
[2024-06-23 09:00:46,925] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)