페이지

2024년 6월 21일 금요일

Topics, partitions and offsets

 - Topics: a particular stream of data

    1) Similar to a table in a database(without all the constraints)

    2) You can have as many topics as you want

    3) A topic is identified by its name

- Topics are split in partitions

    1) Each partition is orderd

    2) Each message within a partition gets an incremental id, called offet

- Offset only have a meaning for a specific partistion.

    - E.g. offset 3 in partition 0 doesn't represent the same data as offset 3 in partition 1

- Order is guaranteed only within a partition (not across partitions)

- Data is kept only for a limited time (default is noe week)

- Once the data is written to a partition, it can't be changed (immutability)

- Data is assigned randomly to a partition unless a key is provided (more on this later)



2024년 6월 20일 목요일

Topic example: Truck_gps

 


- Say you have a fleet of trucks, each truck reports its GPS position to Kaflk.

- You can have a topic trucks_gps that contains the position of all trucks.

- Each truck will send a message to Kafka every 20 seconds, each message will contain the truck ID and the truck position (latitude and longitude

- We choose to create that topic with 10 partitions(arbitrary number)

2024년 6월 4일 화요일

For example

 - Netflix uses Kafka to apply recommendations in real-time while you're watching TV shows

- Uber uses Kafka to gather user, taxi and trip data in real-time to compute and forecast demand, and compute surge pricing in real-time

-LinkedIn uses Kafka to prevent spam, collect user interactgions to make better connection recommendations in real time.

-Remember that Kafka is only used as a transportation mechanism!


Apache Kafks: Use cases

 - Messaging System

- Activity Tracking

- Gather metrics from many different locations

- Application Logs gathering

- Stream processing(with the Kafka Streams API or Spark for example)

- De-coupling of system dependencies

- Integration with Spark, Flink, Storm, Hadoop, and many other Big Data technologies

Why Apache Kafka

 - Created by LinkedIn, now Open Source Project mainly maintained by Confluent

- Distributed, resilient architecture, fault tolerant

- Horizontal scalability:

    1) Can scale to 100s of brokers

    2) Can scale to millions of messages per second

- High performance (latency of less than 10ms) - real time

- Used by the 2000+ firms, 35% of the Fortune 500:

Problems organisations are facing with the revious architecture(Kafka)

 - If you have 4 source systems, and 6 target systems, you need to write 24 integrations!


- Each integration comes with difficulties around

    1) Protocol - how the data is transported(TCP, HTTP, RESt, FTP, JDBC...)

    2) Data format - how the data is parsed(Binary, CSV, JSON, Avro...)

    3) Data schema & evolution - how the data is shaped and may change


- Each source system will have an increased load from the connections

2024년 4월 27일 토요일

JSON WEB TOKEN STRUCTURE

 A JSON Web Token is created of three separate parts separated by dots(.) which include:

aaaaaaaa.bbbbbbbb.cccccccc

- Header: (a)

- Payload: (b)

- Signature: (c)


1) JWT HEADER

- A JWT header usually consist of two parts:

  (alg) The alorithm for signing

  "typ" The specific type of token

{

    "alg": "HS256",

    "typ": "JWT"

}

- The JWT header is then encoded using Base64 to create the first part of the JWT (a)


2) JWT PAYLOAD

- A JWT Payload consists of the data. The Payloads data contains claims, and there are three different types of claims.

  Registered

  Public

  Private

{

   "sub": "1334567890",

   "name": "gildong hong",

   "last_name": "gildong",

   "first_name": "hong",

   "email": "abc@zyx.com"

   "admin": true

}


- The JWT Payload is then encoded using Base64 to create the second part of the JWT(b)


3) JWT SIGNATURE

- A JWT Signature is created by using the algorithm in the header to hash out the encoded header, encoded payload with a secret.

HMACSHA256(

    base64UrlEncode(header) + "." +

    base64UrlEncode(payload),

    secret)


- The secret can be anything, but is saved somewhere on the server that the client does not have access to


- The signature is the third and final part of a JWT(c)