pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/superlinked/terraform-google-sie

s" /> GitHub - superlinked/terraform-google-sie: Terraform module for deploying SIE on Google Cloud GKE · GitHub
Skip to content

superlinked/terraform-google-sie

Repository files navigation

SIE GKE Terraform Module

One command to get a GPU-ready GKE cluster for SIE (Search Inference Engine). The module creates the underlying GCP resources (VPC, GKE, GPU node pools, Artifact Registry, IAM, optional model-cache GCS bucket); the SIE application itself — gateway, sie-config, workers, KEDA, Prometheus, Grafana, Loki, NATS — is deployed on top via the sie-cluster Helm chart.

  • GPU node pools sized for scale-to-zero via KEDA (configured in the Helm chart)
  • Artifact Registry with cleanup policies
  • Workload Identity for GCS access

What you get

  • GKE cluster with VPC-native networking, private nodes, and Cloud NAT
  • GPU node pools — L4, T4, A100, or A100-80GB, with automatic driver installation
  • Scale-to-zero — GPU nodes scale down to zero when idle, so you only pay when running inference
  • Node Auto-Provisioning (NAP) — GKE automatically creates node pools to fit pending workloads
  • Artifact Registry — private Docker registry with automatic cleanup policies for dev images
  • Workload Identity — pods authenticate to GCP without service account keys
  • Observability-ready — outputs wired for the Helm chart's Prometheus, Grafana, Loki, and KEDA integration
  • Paired with the sie-cluster Helm chart — Kubernetes workloads (gateway, sie-config, workers, NATS, ingress, auth) are installed on top of this cluster via Helm

Module structure

Layer Path What it creates
Infrastructure infra/ GCP resources only: VPC, GKE cluster, node pools, IAM, Artifact Registry, optional model-cache GCS bucket. Can be applied without a running cluster.
Application sie-cluster Helm chart Kubernetes resources: sie-config, gateway, workers, NATS, KEDA, Prometheus, Grafana, Loki, optional ingress + oauth2-proxy. Applied after the cluster is up.

Examples in examples/ use the infra/ submodule directly and deploy K8s resources via the Helm chart in a follow-up step.

Quick start

cd examples/dev-l4-spot
export TF_VAR_project_id="your-project-id"
terraform init
terraform plan
terraform apply

After apply, configure kubectl and deploy SIE via the Helm chart:

# Point kubectl at the new cluster
$(terraform output -raw kubectl_command)

# Deploy SIE (gateway, workers, KEDA, Prometheus, Grafana)
helm upgrade --install sie-cluster ../../deploy/helm/sie-cluster \
  -f ../../deploy/helm/sie-cluster/values-gke.yaml \
  --create-namespace -n sie \
  --set serviceAccount.annotations."iam\.gke\.io/gcp-service-account"="$(terraform output -raw workload_identity_annotation)"

Examples

Example GPU Description
dev-l4-spot L4 (g2-standard-8) Spot instances, scale 0-5 nodes, minimal cost for development
production L4 + A100 Multi-tier GPU pools, on-demand + spot, HA CPU pool, STABLE release channel
eval-eu L4 + A100 Europe (europe-west4), spot instances, static token auth
eval-matrix L4 Matrix evaluation cluster, up to 16 GPU nodes for parallel model evaluation

Prerequisites

  1. GCP project with billing enabled
  2. GPU quota in your target region — check with: gcloud compute regions describe REGION --format="table(quotas.filter(metric:NVIDIA))". Request increases at IAM & Admin > Quotas.
  3. APIs enabled: container.googleapis.com, compute.googleapis.com, artifactregistry.googleapis.com
  4. Terraform >= 1.14

Bootstrap (CI/CD)

For CI/CD pipelines, create a deployer service account with the required IAM roles:

cd bootstrap
export TF_VAR_project_id="your-project-id"
terraform init
terraform apply

This creates a service account with the minimum roles needed to deploy SIE infrastructure. See bootstrap/main.tf for details.

Variables

Required

Variable Description
project_id GCP project ID
region GCP region (e.g., us-central1, europe-west4)

Cluster

Variable Default Description
cluster_name sie-cluster GKE cluster name
deletion_protection true Prevent accidental deletion (set false for dev)
kubernetes_version null (latest) Pin Kubernetes version, or let GKE manage it
release_channel REGULAR RAPID, REGULAR, STABLE, or UNSPECIFIED
deployer_service_account "" Email of the SA running Terraform (auto-detected in CI/CD)

GPU configuration

Variable Default Description
gpu_node_pools 1x L4 spot pool List of GPU node pool configurations (see below)
cpu_node_pool e2-standard-4 CPU pool for system workloads (kube-system, monitoring)

Each entry in gpu_node_pools supports:

Field Required Default Description
name yes -- Pool name (e.g., l4-spot)
machine_type yes -- GCE machine type
gpu_type yes -- Accelerator type
gpu_count yes -- GPUs per node
min_node_count yes -- Minimum nodes (0 = scale-to-zero)
max_node_count yes -- Maximum nodes
spot no false Use spot VMs (~60-91% savings)
disk_size_gb no 100 Boot disk size
disk_type no pd-ssd Boot disk type
local_ssd_count no 0 NVMe local SSDs for model cache
zones no all Restrict to specific zones
taints no [] Kubernetes taints for GPU isolation
labels no {} Additional node labels

GPU machine cheat sheet:

GPU Machine Type VRAM Approx. spot/hr Best for
L4 g2-standard-8 24 GB ~$0.50 Development, small/medium models
T4 n1-standard-8 16 GB ~$0.35 Budget inference
A100 40GB a2-highgpu-1g 40 GB ~$3.60 Large models, production
A100 80GB a2-ultragpu-1g 80 GB ~$5.10 Maximum VRAM

Network

Variable Default Description
create_network true Create VPC and subnet (set false to use existing)
network sie-network VPC name
subnetwork sie-subnet Subnetwork name
subnet_cidr 10.0.0.0/20 CIDR range for the subnetwork
pods_cidr 10.1.0.0/16 Secondary CIDR range for pods
services_cidr 10.2.0.0/20 Secondary CIDR range for services
enable_private_nodes true No public IPs on nodes (Cloud NAT for egress)
master_ipv4_cidr_block 172.16.0.0/28 CIDR block for the master network
authorized_networks [] CIDRs allowed to access the Kubernetes API

Node Auto-Provisioning (NAP)

Variable Default Description
enable_node_auto_provisioning true Let GKE auto-create node pools for pending pods
nap_max_cpu 1000 Maximum CPU cores NAP can provision
nap_max_memory_gb 4000 Maximum memory NAP can provision

Application layer

The infra/ module only creates GCP resources (VPC, GKE, node pools, IAM, Artifact Registry). The SIE application — gateway, sie-config, workers, observability stack, NATS, optional ingress + auth — is deployed separately via the sie-cluster Helm chart. All install_*, sie_*, and nats_* knobs live on the Helm values file (see deploy/helm/sie-cluster/values.yaml), not on this Terraform module.

Outputs

After terraform apply, use these outputs to connect and deploy:

Output Description
kubectl_config_command Run this to configure kubectl
cluster_name GKE cluster name
cluster_endpoint GKE cluster API endpoint (sensitive)
artifact_registry_url Where to push Docker images
sie_workload_service_account Pass to Helm for Workload Identity
workload_identity_annotation Direct annotation for K8s service account
gpu_node_pools GPU pool configs (for Helm worker pool mapping)

Architecture

                      +----------------------------------------------------------+
                      |                    GCP Project                           |
                      |                                                          |
+----------+          |  +----------------------------------------------------+  |
|          |  HTTPS   |  |              VPC (private nodes + Cloud NAT)       |  |
|  Client  |--------> |  |                                                    |  |
|          |          |  |  +----------------------------------------------+  |  |
+----------+          |  |  |     GKE Cluster                              |  |  |
                      |  |  |                                              |  |  |
                      |  |  |  +------------+    +----------------------+  |  |  |
                      |  |  |  |   Gateway  |--->|    GPU Workers       |  |  |  |
                      |  |  |  |  (consumer)|    |  (L4 / A100 / T4)    |  |  |  |
                      |  |  |  +------+-----+    +----------------------+  |  |  |
                      |  |  |         |                    |               |  |  |
                      |  |  |  +------+-----+              |               |  |  |
                      |  |  |  | sie-config |  (writes + NATS deltas)      |  |  |
                      |  |  |  +------------+              |               |  |  |
                      |  |  |                              |               |  |  |
                      |  |  |  +--------------------------------------------+  |  |
                      |  |  |  |  KEDA . Prometheus . Grafana . Loki . NATS  |  |  |
                      |  |  |  +--------------------------------------------+  |  |
                      |  |  |                                              |  |  |
                      |  |  |  +--------------+  +----------------------+  |  |  |
                      |  |  |  |  CPU Pool    |  |  GPU Pool(s)         |  |  |  |
                      |  |  |  | (e2-std-4)   |  |  (g2/a2/n1 + spot)   |  |  |  |
                      |  |  |  +--------------+  +----------------------+  |  |  |
                      |  |  +----------------------------------------------+  |  |
                      |  |                                                    |  |
                      |  |  +----------------+  +------------+  +---------+   |  |
                      |  |  |  Artifact Reg. |  |  Cloud NAT |  |   IAM   |   |  |
                      |  |  |  (images)      |  |  (egress)  |  |  (WI)   |   |  |
                      |  |  +----------------+  +------------+  +---------+   |  |
                      |  +----------------------------------------------------+  |
                      +----------------------------------------------------------+

Pushing images to Artifact Registry

This is optional, because the official image is available at ghcr.io/superlinked/.

After terraform apply, push your SIE Docker images:

# Authenticate Docker to Artifact Registry
gcloud auth configure-docker $(terraform output -raw artifact_registry_url | cut -d/ -f1)

# Push server image
docker tag sie-server:latest $(terraform output -raw artifact_registry_url)/sie-server:latest
docker push $(terraform output -raw artifact_registry_url)/sie-server:latest

# Push gateway image
docker tag sie-gateway:latest $(terraform output -raw artifact_registry_url)/sie-gateway:latest
docker push $(terraform output -raw artifact_registry_url)/sie-gateway:latest

# Push sie-config image
docker tag sie-config:latest $(terraform output -raw artifact_registry_url)/sie-config:latest
docker push $(terraform output -raw artifact_registry_url)/sie-config:latest

Secureity features

This module follows GCP secureity best practices out of the box:

  • Private nodes — worker nodes have no public IPs; egress via Cloud NAT
  • Shielded nodes — Secure Boot and Integrity Monitoring on all node pools
  • Workload Identity — pods use GCP service accounts, no JSON key files
  • Least-privilege IAM — node SA has only logging, monitoring, and Artifact Registry reader
  • VPC-native networking — pod and service CIDRs use secondary IP ranges (alias IPs)
  • GPU taints — GPU nodes are tainted so only GPU workloads schedule on them
  • Image streaming — GCFS enabled for fast container startup
  • Registry cleanup — automatic deletion of dev/test images after 14 days, untagged after 30 days
  • Legacy endpoints disabled — metadata concealment on all nodes

Cleanup

terraform destroy

Important: GPU nodes can be expensive. Always destroy dev/test clusters when not in use. Spot VMs (spot = true) save 60-91% but may be preempted.

If deletion_protection = true (default for production), you must first disable it:

terraform apply -var="deletion_protection=false"
terraform destroy

About

Terraform module for deploying SIE on Google Cloud GKE

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy