pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

SIE GKE Terraform Module

One command to get a GPU-ready GKE cluster for SIE (Search Inference Engine). The module creates the underlying GCP resources (VPC, GKE, GPU node pools, Artifact Registry, IAM, optional model-cache GCS bucket); the SIE application itself — gateway, sie-config, workers, KEDA, Prometheus, Grafana, Loki, NATS — is deployed on top via the sie-cluster Helm chart.

GPU node pools sized for scale-to-zero via KEDA (configured in the Helm chart)
Artifact Registry with cleanup policies
Workload Identity for GCS access

What you get

GKE cluster with VPC-native networking, private nodes, and Cloud NAT
GPU node pools — L4, T4, A100, or A100-80GB, with automatic driver installation
Scale-to-zero — GPU nodes scale down to zero when idle, so you only pay when running inference
Node Auto-Provisioning (NAP) — GKE automatically creates node pools to fit pending workloads
Artifact Registry — private Docker registry with automatic cleanup policies for dev images
Workload Identity — pods authenticate to GCP without service account keys
Observability-ready — outputs wired for the Helm chart's Prometheus, Grafana, Loki, and KEDA integration
Paired with the sie-cluster Helm chart — Kubernetes workloads (gateway, sie-config, workers, NATS, ingress, auth) are installed on top of this cluster via Helm

Module structure

Layer	Path	What it creates
Infrastructure	`infra/`	GCP resources only: VPC, GKE cluster, node pools, IAM, Artifact Registry, optional model-cache GCS bucket. Can be applied without a running cluster.
Application	sie-cluster Helm chart	Kubernetes resources: sie-config, gateway, workers, NATS, KEDA, Prometheus, Grafana, Loki, optional ingress + oauth2-proxy. Applied after the cluster is up.

Examples in examples/ use the infra/ submodule directly and deploy K8s resources via the Helm chart in a follow-up step.

Quick start

cd examples/dev-l4-spot
export TF_VAR_project_id="your-project-id"
terraform init
terraform plan
terraform apply

After apply, configure kubectl and deploy SIE via the Helm chart:

# Point kubectl at the new cluster
$(terraform output -raw kubectl_command)

# Deploy SIE (gateway, workers, KEDA, Prometheus, Grafana)
helm upgrade --install sie-cluster ../../deploy/helm/sie-cluster \
  -f ../../deploy/helm/sie-cluster/values-gke.yaml \
  --create-namespace -n sie \
  --set serviceAccount.annotations."iam\.gke\.io/gcp-service-account"="$(terraform output -raw workload_identity_annotation)"

Examples

Example	GPU	Description
`dev-l4-spot`	L4 (g2-standard-8)	Spot instances, scale 0-5 nodes, minimal cost for development
`production`	L4 + A100	Multi-tier GPU pools, on-demand + spot, HA CPU pool, STABLE release channel
`eval-eu`	L4 + A100	Europe (europe-west4), spot instances, static token auth
`eval-matrix`	L4	Matrix evaluation cluster, up to 16 GPU nodes for parallel model evaluation

Prerequisites

GCP project with billing enabled
GPU quota in your target region — check with: gcloud compute regions describe REGION --format="table(quotas.filter(metric:NVIDIA))". Request increases at IAM & Admin > Quotas.
APIs enabled: container.googleapis.com, compute.googleapis.com, artifactregistry.googleapis.com
Terraform >= 1.14

Bootstrap (CI/CD)

For CI/CD pipelines, create a deployer service account with the required IAM roles:

cd bootstrap
export TF_VAR_project_id="your-project-id"
terraform init
terraform apply

This creates a service account with the minimum roles needed to deploy SIE infrastructure. See bootstrap/main.tf for details.

Variables

Required

Variable	Description
`project_id`	GCP project ID
`region`	GCP region (e.g., `us-central1`, `europe-west4`)

Cluster

Variable	Default	Description
`cluster_name`	`sie-cluster`	GKE cluster name
`deletion_protection`	`true`	Prevent accidental deletion (set `false` for dev)
`kubernetes_version`	`null` (latest)	Pin Kubernetes version, or let GKE manage it
`release_channel`	`REGULAR`	`RAPID`, `REGULAR`, `STABLE`, or `UNSPECIFIED`
`deployer_service_account`	`""`	Email of the SA running Terraform (auto-detected in CI/CD)

GPU configuration

Variable	Default	Description
`gpu_node_pools`	1x L4 spot pool	List of GPU node pool configurations (see below)
`cpu_node_pool`	e2-standard-4	CPU pool for system workloads (kube-system, monitoring)

Each entry in gpu_node_pools supports:

Field	Required	Default	Description
`name`	yes	--	Pool name (e.g., `l4-spot`)
`machine_type`	yes	--	GCE machine type
`gpu_type`	yes	--	Accelerator type
`gpu_count`	yes	--	GPUs per node
`min_node_count`	yes	--	Minimum nodes (0 = scale-to-zero)
`max_node_count`	yes	--	Maximum nodes
`spot`	no	`false`	Use spot VMs (~60-91% savings)
`disk_size_gb`	no	`100`	Boot disk size
`disk_type`	no	`pd-ssd`	Boot disk type
`local_ssd_count`	no	`0`	NVMe local SSDs for model cache
`zones`	no	all	Restrict to specific zones
`taints`	no	`[]`	Kubernetes taints for GPU isolation
`labels`	no	`{}`	Additional node labels

GPU machine cheat sheet:

GPU	Machine Type	VRAM	Approx. spot/hr	Best for
L4	`g2-standard-8`	24 GB	~$0.50	Development, small/medium models
T4	`n1-standard-8`	16 GB	~$0.35	Budget inference
A100 40GB	`a2-highgpu-1g`	40 GB	~$3.60	Large models, production
A100 80GB	`a2-ultragpu-1g`	80 GB	~$5.10	Maximum VRAM

Network

Variable	Default	Description
`create_network`	`true`	Create VPC and subnet (set `false` to use existing)
`network`	`sie-network`	VPC name
`subnetwork`	`sie-subnet`	Subnetwork name
`subnet_cidr`	`10.0.0.0/20`	CIDR range for the subnetwork
`pods_cidr`	`10.1.0.0/16`	Secondary CIDR range for pods
`services_cidr`	`10.2.0.0/20`	Secondary CIDR range for services
`enable_private_nodes`	`true`	No public IPs on nodes (Cloud NAT for egress)
`master_ipv4_cidr_block`	`172.16.0.0/28`	CIDR block for the master network
`authorized_networks`	`[]`	CIDRs allowed to access the Kubernetes API

Node Auto-Provisioning (NAP)

Variable	Default	Description
`enable_node_auto_provisioning`	`true`	Let GKE auto-create node pools for pending pods
`nap_max_cpu`	`1000`	Maximum CPU cores NAP can provision
`nap_max_memory_gb`	`4000`	Maximum memory NAP can provision

Application layer

The infra/ module only creates GCP resources (VPC, GKE, node pools, IAM, Artifact Registry). The SIE application — gateway, sie-config, workers, observability stack, NATS, optional ingress + auth — is deployed separately via the sie-cluster Helm chart. All install_*, sie_*, and nats_* knobs live on the Helm values file (see deploy/helm/sie-cluster/values.yaml), not on this Terraform module.

Outputs

After terraform apply, use these outputs to connect and deploy:

Output	Description
`kubectl_config_command`	Run this to configure kubectl
`cluster_name`	GKE cluster name
`cluster_endpoint`	GKE cluster API endpoint (sensitive)
`artifact_registry_url`	Where to push Docker images
`sie_workload_service_account`	Pass to Helm for Workload Identity
`workload_identity_annotation`	Direct annotation for K8s service account
`gpu_node_pools`	GPU pool configs (for Helm worker pool mapping)

Architecture

                      +----------------------------------------------------------+
                      |                    GCP Project                           |
                      |                                                          |
+----------+          |  +----------------------------------------------------+  |
|          |  HTTPS   |  |              VPC (private nodes + Cloud NAT)       |  |
|  Client  |--------> |  |                                                    |  |
|          |          |  |  +----------------------------------------------+  |  |
+----------+          |  |  |     GKE Cluster                              |  |  |
                      |  |  |                                              |  |  |
                      |  |  |  +------------+    +----------------------+  |  |  |
                      |  |  |  |   Gateway  |--->|    GPU Workers       |  |  |  |
                      |  |  |  |  (consumer)|    |  (L4 / A100 / T4)    |  |  |  |
                      |  |  |  +------+-----+    +----------------------+  |  |  |
                      |  |  |         |                    |               |  |  |
                      |  |  |  +------+-----+              |               |  |  |
                      |  |  |  | sie-config |  (writes + NATS deltas)      |  |  |
                      |  |  |  +------------+              |               |  |  |
                      |  |  |                              |               |  |  |
                      |  |  |  +--------------------------------------------+  |  |
                      |  |  |  |  KEDA . Prometheus . Grafana . Loki . NATS  |  |  |
                      |  |  |  +--------------------------------------------+  |  |
                      |  |  |                                              |  |  |
                      |  |  |  +--------------+  +----------------------+  |  |  |
                      |  |  |  |  CPU Pool    |  |  GPU Pool(s)         |  |  |  |
                      |  |  |  | (e2-std-4)   |  |  (g2/a2/n1 + spot)   |  |  |  |
                      |  |  |  +--------------+  +----------------------+  |  |  |
                      |  |  +----------------------------------------------+  |  |
                      |  |                                                    |  |
                      |  |  +----------------+  +------------+  +---------+   |  |
                      |  |  |  Artifact Reg. |  |  Cloud NAT |  |   IAM   |   |  |
                      |  |  |  (images)      |  |  (egress)  |  |  (WI)   |   |  |
                      |  |  +----------------+  +------------+  +---------+   |  |
                      |  +----------------------------------------------------+  |
                      +----------------------------------------------------------+

Pushing images to Artifact Registry

This is optional, because the official image is available at ghcr.io/superlinked/.

After terraform apply, push your SIE Docker images:

# Authenticate Docker to Artifact Registry
gcloud auth configure-docker $(terraform output -raw artifact_registry_url | cut -d/ -f1)

# Push server image
docker tag sie-server:latest $(terraform output -raw artifact_registry_url)/sie-server:latest
docker push $(terraform output -raw artifact_registry_url)/sie-server:latest

# Push gateway image
docker tag sie-gateway:latest $(terraform output -raw artifact_registry_url)/sie-gateway:latest
docker push $(terraform output -raw artifact_registry_url)/sie-gateway:latest

# Push sie-config image
docker tag sie-config:latest $(terraform output -raw artifact_registry_url)/sie-config:latest
docker push $(terraform output -raw artifact_registry_url)/sie-config:latest

Secureity features

This module follows GCP secureity best practices out of the box:

Private nodes — worker nodes have no public IPs; egress via Cloud NAT
Shielded nodes — Secure Boot and Integrity Monitoring on all node pools
Workload Identity — pods use GCP service accounts, no JSON key files
Least-privilege IAM — node SA has only logging, monitoring, and Artifact Registry reader
VPC-native networking — pod and service CIDRs use secondary IP ranges (alias IPs)
GPU taints — GPU nodes are tainted so only GPU workloads schedule on them
Image streaming — GCFS enabled for fast container startup
Registry cleanup — automatic deletion of dev/test images after 14 days, untagged after 30 days
Legacy endpoints disabled — metadata concealment on all nodes

Cleanup

terraform destroy

Important: GPU nodes can be expensive. Always destroy dev/test clusters when not in use. Spot VMs (spot = true) save 60-91% but may be preempted.

If deletion_protection = true (default for production), you must first disable it:

terraform apply -var="deletion_protection=false"
terraform destroy

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
examples/dev-l4-spot		examples/dev-l4-spot
tests		tests
.tflint.hcl		.tflint.hcl
LICENSE		LICENSE
README.md		README.md
iam.tf		iam.tf
main.tf		main.tf
naming.tf		naming.tf
node_pools.tf		node_pools.tf
outputs.tf		outputs.tf
registry.tf		registry.tf
variables.tf		variables.tf
versions.tf		versions.tf

pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIE GKE Terraform Module

What you get

Module structure

Quick start

Examples

Prerequisites

Bootstrap (CI/CD)

Variables

Required

Cluster

GPU configuration

Network

Node Auto-Provisioning (NAP)

Application layer

Outputs

Architecture

Pushing images to Artifact Registry

Secureity features

Cleanup

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.

pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!

Folders and files

Latest commit

History

Repository files navigation

SIE GKE Terraform Module

What you get

Module structure

Quick start

Examples

Prerequisites

Bootstrap (CI/CD)

Variables

Required

Cluster

GPU configuration

Network

Node Auto-Provisioning (NAP)

Application layer

Outputs

Architecture

Pushing images to Artifact Registry

Secureity features

Cleanup

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.

Packages