페이지

2024년 7월 12일 금요일

Kafka Multi Cluster + Replication

1. Kafka can only operate well in a single resion

2. Therefore, it is very common for enterprises to have Kafka clusters across the world, with some level of replication between them


3. A replication application at its core is just a consumer + a producer

4. There are different tools to perform it:

    - Mirror Maker - open source tool that ships with Kafka

    - netflix users Flink - they wrote their own applicaiton

    - Uber uysers uRepli8cator - address performance and operations issues with MM

    - Comcast has their own open source Kafka Connect Source

    - Confluent has their own Kafka Connect Source(paid)

5. Overall, try these and see if it works for your use case before writing your own


6. There are two desings for cluster replication:


7. Active => Active:

    - You have a global application

    - You have a global dataset


8. Active => Passive:

    - You want to have an aggregation cluster (for example for analytics)

    - You want to create some form of disaster recovery strategy (it's hard)

    - Cloud Migration (from on-premise cluster to Cloud cluster)


9. Replicating doesn't preserve offsets, just data!











State of the art of Kafka Security

1. Kafka Security is fairly new (0.10.0)

2. Kafka Security improves over time and becomes more flexible / easier to setup as time goes.

3. Currently, it i hard to setup Kafka Security.

4. Best support for Kafka Security for applications is with JAVA

Putting it all together

1. You can mix

    - Encryption

    - Authentication

    - Authorisation


2. This allows you Kafka clients to:

    - Communicate securely to Kafka

    - Clients would authenticate against Kafka

    - Kafka can authorise clients to read / write to topics



Authentication in Kafka

1. Authentication in Kafka ensures that only clients thats can prove their identity can connect to our Kafka Cluster

2. This is similar concept to a login (username / password)

3. Authentication in Kafka can take a few forms

4. SSL Authentication: clients authenticate to Kafka using SSL certificates

5. SASL Authentication:
    - PLAIN: clients authenticate using username / password (weak - easy to setup)
    - Kerberos: such as Microsoft Active Directory (strong - hard to setup)
    - SCRAM: username / password (strong - medium to setup)

6. Once a client is authenticated, Kafka can verify its identity

7. It still needs to be combined with authorisatioin, so that Kafka knows that
    - "User alice can view topic finace"
    - "User bob cannot view topic trucks"

8. ACL(Access Control Lists) have to be maintained by administration and onboard new users




Encryption in Kafka

1. Encryption in Kafka ensures that the data exchanged between clients and brokers is secret to routers on the way

2. This is similar concept to an https website




The need for encryption, authentication & nauthorization in Kafka

 1. Currently, any client can access your Kafka cluster (authentication)

2. The clients can publish / consume any topic data (authorisation)

3. All the data being sent is fully visible on the network (encryption)


- Someone could intercept data being sent

- Someone could publish bad data / steal data

- Someone could delete topics


- All these reasons push for more security and an authentication mode

Kafka Monitoring and Operations

1. Kafka exposes metrics through JMX.

2. These metrics are highly important for monitoring Kafka, and ensuring the systems are behaving correctly under load.

3. Common places to host the Kafka metrics:

    - ELK(ElasticSearch _ Kibana)

    - Datadog

    - NewRelic

    - Confluent Control Centre

    - Promotheus

    - Many others...!


4. Some of the most important metrics are:


5. Under Replicated Partitions: Number of partitions are have problems with the ISR (in-sync replicas). May indicate a high load on the system


6. Request Handlers: utilization of threads for IO, network, etc... overall utilization of an Apache Kafka broker.


7. Request timing: how long it takes to reply to requests. Lower is better, as latency will be improved.


8. Overall have a look at the documentation here:

- https://kafka.apache.org/documentation/#monitoring

- https://docs.confluent.io/current/kafka/monitoring.html

- https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/


9. Kafka Operations team must be able to perform the following tasks:

    - Rolling restart of Brokers

    - Updating Configurations

    - Rebalancing Partitions

    - Increasing replication factor

    - Adding a Broker

    - Replacing a Broker

    - Removing a Broker

    - Upgrading a Kafka Cluster with zero downtime