1. Encryption in Kafka ensures that the data exchanged between clients and brokers is secret to routers on the way
2. This is similar concept to an https website
1. Encryption in Kafka ensures that the data exchanged between clients and brokers is secret to routers on the way
2. This is similar concept to an https website
1. Currently, any client can access your Kafka cluster (authentication)
2. The clients can publish / consume any topic data (authorisation)
3. All the data being sent is fully visible on the network (encryption)
- Someone could intercept data being sent
- Someone could publish bad data / steal data
- Someone could delete topics
- All these reasons push for more security and an authentication mode
1. Kafka exposes metrics through JMX.
2. These metrics are highly important for monitoring Kafka, and ensuring the systems are behaving correctly under load.
3. Common places to host the Kafka metrics:
- ELK(ElasticSearch _ Kibana)
- Datadog
- NewRelic
- Confluent Control Centre
- Promotheus
- Many others...!
4. Some of the most important metrics are:
5. Under Replicated Partitions: Number of partitions are have problems with the ISR (in-sync replicas). May indicate a high load on the system
6. Request Handlers: utilization of threads for IO, network, etc... overall utilization of an Apache Kafka broker.
7. Request timing: how long it takes to reply to requests. Lower is better, as latency will be improved.
8. Overall have a look at the documentation here:
- https://kafka.apache.org/documentation/#monitoring
- https://docs.confluent.io/current/kafka/monitoring.html
- https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/
9. Kafka Operations team must be able to perform the following tasks:
- Rolling restart of Brokers
- Updating Configurations
- Rebalancing Partitions
- Increasing replication factor
- Adding a Broker
- Replacing a Broker
- Removing a Broker
- Upgrading a Kafka Cluster with zero downtime
1. It's not easy to setup a cluster
2. You want to isolate each Zookeeper & Broker on separate servers
3. Monitoring needs to be implemented
4. Operations have to be mastered
5. You need a really good Kafka Admin
- Alternative: main different "Kafka as a Service" offerings on the web
- No operational burdens (upates, monitoring, setup, etc...)
1. You want multiple brokers in different data centers (racks) to distribute your load. You also want a cluster of at least 3 zookeeper
2. In AWS:
1. Four Common Kafka Use Cases:
Source => Kafka Producer API Kafka Connect Source
Kafka => Kafka Consumer, Producer API Kafka Streams
Kafka => Sink Consumer API Kafka Connect Sink
Kafka => App Consumer API
- Simplify and improve getting data in and out of Kafka
- Simplify transforming data wiythin Kafka without relying on external libs
- Programmers always want to import data from the same sources:
Databases, KDBC, Couchbase, GoldenGate, SAP HANA, Blockchain, Cassandra, DynamoDB, FTP, IOT, MongoDB, MQTT, RethinkDB, Salesforce, Solr, SQS, Twitter, etc...
- Programmers always want to store data in the same sinks:
S3, ElasticSearch, HDFS, KDBC, SAP HANA, DocumentDB, Cassandra, DynamoDB, HBase, MongoDB, Redis, Solr, Splunk, Twitter
- It is tough to achieve Fault Tolerance, IDempotence, Distribution, Ordering
- Other programmers may already have done a very good job!