페이지

2024년 7월 5일 금요일

Idempotent Producer

 1. Here's the problem: the Producer can introduce duplicate messages in Kakka due to network errors

- In Kafka >= 0.11, you can define a "idempotent producer" which won't introduce duplicates on network error


- Idempotent producers are great to guarantee a stable and safe pipeline!
- They come with:
    - retries = Integer.MAX_VALUE(2^31-1 = 2147483647)
    - max.in.flight.requests = I (Kafka >= 0.11 & < 1.1) or
    - max.in.flight.requests = 5 (Kafka >= 1.1 - higher performance)
    - acks = all

- Just set:
    - producerProps.put("enable.idempotence", true);












Producer retries

 1. In case of transient failures, developers are expected to handle exceptions, otherwise the data will be lost.

2. Example of transient failure:

    - NotEnoughReplicasException

3. There is a "retries" setting

    - defaults to 0

    - You can increase to a high number, ex Integer.MAX_VALUE


- In case of retries, by default there is a chance that messages will be sent out of order (if a batch has failed to be sent).

- If yolu rely on key-based ordering, that can be an issue.

- For this, you can set the setting while controls how many produce requests can be made in parallel: max.in.flight.requests.per.connection

    - Default: 5 

    - Set it to I if you need to ensure ordering (may impact throughput)

- In Kafka >= 1.0.0, there's a better solution!


Producers Acks Deep Dive acks = all (replicas acks)

1. Leader + Replicas ack requested

2. Added latency and safety

3. No data loss if enough replicas


- Necessary setting if you don't want to lose data

- Acks=all must be used in conjunction with min.insync.replicas.

- min.insync.replicas can be set at the broker or topic level (override).

- min.insync.replicas=2 implies that at least 2 brokers that are ISR(including leader) must responsd that they have the data.

- That means if you use replication.factor=3, min.insync=2, acks=all, you can only tolerate I broker going down, otherwise the producer will receive an exception on send.








Producers Acks Deep Dive acks = 1 (leader acks)

 1. Leader response is requested, but replication is not a guarantee

(happens in the background)

2. If an ack is not received, the produceder may retry

3. If the leader broker goes offline but replicas haven't replicated the data yet, we have a data loss.






Producers Acks Deep Dive acks = 0 (no acks)

1. No response is requested

2. If the broker goes offline or an exception happens, we won't know and will lose data

3. Useful for data where it's okay to potentially lose messages:
    - Metrics collection
    - Log collection





2024년 6월 22일 토요일

Kafka Command Line Interface 101

 kafka-topics


C:\kafka_2.12-3.7.0>.\bin\windows\kafka-topics.bat --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --create --topic mytopic --partitions 2 --replication-factor 1
Created topic mytopic.





Start Kafka

 https://kafka.apache.org/ download


mkdir

C:\kafka_2.12-3.7.0\data\zookeeper


setting path: bin



C:\kafka_2.12-3.7.0\config\zookeeper.properties edit 

dataDir=C:/kafka_2.12-3.7.0/data/zookeeper


C:\kafka_2.12-3.7.0 zookeeper-server-start.bat config\zookeeper.properties


C:\kafka_2.12-3.7.0>bin\windows\zookeeper-server-start.bat config\zookeeper.properties

[2024-06-23 08:53:35,580] INFO binding to port 0.0.0.0/0.0.0.0:2181 (org.apache.zookeeper.server.NIOServerCnxnFactory)



C:\kafka_2.12-3.7.0\config\server.properties edit 

log.dirs=C:/kafka_2.12-3.7.0/data/kafka


C:\kafka_2.12-3.7.0>.\bin\windows\kafka-server-start.bat config\server.properties


[2024-06-23 09:00:46,925] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)