For each of these cloud providers, you'll probably notice that we have a common set of commands; creating a Kubenetes cluster, installing Kubeflow, and starting the application. While we can use scripts to automate this process, if would be desirable to, like our code, have a way to version control and persist different infrastructure configurations, allowing a reproducible recipe for creaating the set of resources we need to urn Kubeflow. If would also help us potentially move between cloud providers without completely rewriting our installation logic.
The template language Terraform(https://www.terraform.io/)was created by HashiCorp as a tool for Infrastructure as Service(IaaS). In the same way that Kubernetes has an API to update resources on a cluster, Terraform allows us to abstract interactions with different underlying cloud providers using an API and a template language using a command-line utility and core components written in GoLang(Figure 2.7). Terraform can be extended using user-written plugins.
Terraform Core <----> Providers
RPC Provisioners Upstream APIs
Plugins
Client Library
Let's look at one example of installing Kubeflow using Terraform instuctions on AWS, located at https://github.com/aws-samples/amazon-eks-machine-learning-with-terraform-and-kubeflow. Once you have established the required AWS resources and installed terraform on an EC2 container, the aws-eks-cluster-and-nodegroup. tf Terraform file is used to create the Kubeflow cluster using the command:
terraform apply
In this file are a few key components. One is variables that specify aspects of the deployment:
variable "efs_throughput_mode" {
description = "EFS performance mode"
default = "burstring"
type = string
}
Another is specification for which cloud provider we are using:
provider "aws" {
region = var.region
shared_credentials_file = var.credentials
resrouce "aws_eks_cluster" "eks_cluster" {
name = var.cluster_name
role_arn = aws_iam_role.cluster.role.arn
version = var.k8s_version
vpc_config {
security_group_ids = [aws_security_group.cluster_sg.id]
subnet_ids = flatten([aws_subnet.subnet.*.id])
}
depends_on = [
aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
aws_iam_role_policy_attachment.cluster_AmazonKSServicePolicy,
]
provisioner "local-exec" {
command = "aws --region ${var.region} eks update-kubeconfig --name ${aws_eks_cluster.eks_cluster.name}"
}
provisioner "local-exec" {
when = destroy
command = "kubectl config unset current-context"
}
}
profile = var.profile
}
And another is resources such as the EKS cluster:
resource "aws_eks_cluster" "eks_cluster"{
name = var.cluster_name
role_arn = aws_iam_role.cluster_role.arn
version = var.k8s_version
vpc_config {
security_group_ids = [aws_security_group.cluster_sg.id]
subnet_ids = flatten([aws_subnet.subnet.*.id])
}
depends_on = [
aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
aws_iam_role_policy_attachment.cluster_AmazonEKSServicePolicy,
}
provisioner "local-exec" {
command = "aws --region ${var.region} eks update-kubeconfig --name ${aws_eks_cluster.eks_cluster.name}"
}
provisioner "local-exec" {
when = destroy
command = "kubectl config unset current-context"
}
}
Every time run the Terraform apply command, it walks through this to determine what resources to create, which underlying AWS services to call to create them, and with which set of configuration they should be provisioned. This provides a clean way to orchestrate complex installations such as Kubeflow in a versioned, extensible template language.
Now that we have successfully installed Kubeflow either locally or on a managed Kubernetes control plane in the cloud, let us take a look at what tools are abailable on the platform.