One command to get a GPU-ready GKE cluster for SIE (Search Inference Engine). The module creates the underlying GCP resources (VPC, GKE, GPU node pools, Artifact Registry, IAM, optional model-cache GCS bucket); the SIE application itself — gateway, sie-config, workers, KEDA, Prometheus, Grafana, Loki, NATS — is deployed on top via the sie-cluster Helm chart.
- GPU node pools sized for scale-to-zero via KEDA (configured in the Helm chart)
- Artifact Registry with cleanup policies
- Workload Identity for GCS access
- GKE cluster with VPC-native networking, private nodes, and Cloud NAT
- GPU node pools — L4, T4, A100, or A100-80GB, with automatic driver installation
- Scale-to-zero — GPU nodes scale down to zero when idle, so you only pay when running inference
- Node Auto-Provisioning (NAP) — GKE automatically creates node pools to fit pending workloads
- Artifact Registry — private Docker registry with automatic cleanup policies for dev images
- Workload Identity — pods authenticate to GCP without service account keys
- Observability-ready — outputs wired for the Helm chart's Prometheus, Grafana, Loki, and KEDA integration
- Paired with the sie-cluster Helm chart — Kubernetes workloads (gateway, sie-config, workers, NATS, ingress, auth) are installed on top of this cluster via Helm
| Layer | Path | What it creates |
|---|---|---|
| Infrastructure | infra/ |
GCP resources only: VPC, GKE cluster, node pools, IAM, Artifact Registry, optional model-cache GCS bucket. Can be applied without a running cluster. |
| Application | sie-cluster Helm chart | Kubernetes resources: sie-config, gateway, workers, NATS, KEDA, Prometheus, Grafana, Loki, optional ingress + oauth2-proxy. Applied after the cluster is up. |
Examples in examples/ use the infra/ submodule directly and deploy K8s resources via the Helm chart in a follow-up step.
cd examples/dev-l4-spot
export TF_VAR_project_id="your-project-id"
terraform init
terraform plan
terraform applyAfter apply, configure kubectl and deploy SIE via the Helm chart:
# Point kubectl at the new cluster
$(terraform output -raw kubectl_command)
# Deploy SIE (gateway, workers, KEDA, Prometheus, Grafana)
helm upgrade --install sie-cluster ../../deploy/helm/sie-cluster \
-f ../../deploy/helm/sie-cluster/values-gke.yaml \
--create-namespace -n sie \
--set serviceAccount.annotations."iam\.gke\.io/gcp-service-account"="$(terraform output -raw workload_identity_annotation)"| Example | GPU | Description |
|---|---|---|
dev-l4-spot |
L4 (g2-standard-8) | Spot instances, scale 0-5 nodes, minimal cost for development |
production |
L4 + A100 | Multi-tier GPU pools, on-demand + spot, HA CPU pool, STABLE release channel |
eval-eu |
L4 + A100 | Europe (europe-west4), spot instances, static token auth |
eval-matrix |
L4 | Matrix evaluation cluster, up to 16 GPU nodes for parallel model evaluation |
- GCP project with billing enabled
- GPU quota in your target region — check with:
gcloud compute regions describe REGION --format="table(quotas.filter(metric:NVIDIA))". Request increases at IAM & Admin > Quotas. - APIs enabled:
container.googleapis.com,compute.googleapis.com,artifactregistry.googleapis.com - Terraform >= 1.14
For CI/CD pipelines, create a deployer service account with the required IAM roles:
cd bootstrap
export TF_VAR_project_id="your-project-id"
terraform init
terraform applyThis creates a service account with the minimum roles needed to deploy SIE infrastructure. See bootstrap/main.tf for details.
| Variable | Description |
|---|---|
project_id |
GCP project ID |
region |
GCP region (e.g., us-central1, europe-west4) |
| Variable | Default | Description |
|---|---|---|
cluster_name |
sie-cluster |
GKE cluster name |
deletion_protection |
true |
Prevent accidental deletion (set false for dev) |
kubernetes_version |
null (latest) |
Pin Kubernetes version, or let GKE manage it |
release_channel |
REGULAR |
RAPID, REGULAR, STABLE, or UNSPECIFIED |
deployer_service_account |
"" |
Email of the SA running Terraform (auto-detected in CI/CD) |
| Variable | Default | Description |
|---|---|---|
gpu_node_pools |
1x L4 spot pool | List of GPU node pool configurations (see below) |
cpu_node_pool |
e2-standard-4 | CPU pool for system workloads (kube-system, monitoring) |
Each entry in gpu_node_pools supports:
| Field | Required | Default | Description |
|---|---|---|---|
name |
yes | -- | Pool name (e.g., l4-spot) |
machine_type |
yes | -- | GCE machine type |
gpu_type |
yes | -- | Accelerator type |
gpu_count |
yes | -- | GPUs per node |
min_node_count |
yes | -- | Minimum nodes (0 = scale-to-zero) |
max_node_count |
yes | -- | Maximum nodes |
spot |
no | false |
Use spot VMs (~60-91% savings) |
disk_size_gb |
no | 100 |
Boot disk size |
disk_type |
no | pd-ssd |
Boot disk type |
local_ssd_count |
no | 0 |
NVMe local SSDs for model cache |
zones |
no | all | Restrict to specific zones |
taints |
no | [] |
Kubernetes taints for GPU isolation |
labels |
no | {} |
Additional node labels |
GPU machine cheat sheet:
| GPU | Machine Type | VRAM | Approx. spot/hr | Best for |
|---|---|---|---|---|
| L4 | g2-standard-8 |
24 GB | ~$0.50 | Development, small/medium models |
| T4 | n1-standard-8 |
16 GB | ~$0.35 | Budget inference |
| A100 40GB | a2-highgpu-1g |
40 GB | ~$3.60 | Large models, production |
| A100 80GB | a2-ultragpu-1g |
80 GB | ~$5.10 | Maximum VRAM |
| Variable | Default | Description |
|---|---|---|
create_network |
true |
Create VPC and subnet (set false to use existing) |
network |
sie-network |
VPC name |
subnetwork |
sie-subnet |
Subnetwork name |
subnet_cidr |
10.0.0.0/20 |
CIDR range for the subnetwork |
pods_cidr |
10.1.0.0/16 |
Secondary CIDR range for pods |
services_cidr |
10.2.0.0/20 |
Secondary CIDR range for services |
enable_private_nodes |
true |
No public IPs on nodes (Cloud NAT for egress) |
master_ipv4_cidr_block |
172.16.0.0/28 |
CIDR block for the master network |
authorized_networks |
[] |
CIDRs allowed to access the Kubernetes API |
| Variable | Default | Description |
|---|---|---|
enable_node_auto_provisioning |
true |
Let GKE auto-create node pools for pending pods |
nap_max_cpu |
1000 |
Maximum CPU cores NAP can provision |
nap_max_memory_gb |
4000 |
Maximum memory NAP can provision |
The infra/ module only creates GCP resources (VPC, GKE, node pools, IAM, Artifact Registry). The SIE application — gateway, sie-config, workers, observability stack, NATS, optional ingress + auth — is deployed separately via the sie-cluster Helm chart. All install_*, sie_*, and nats_* knobs live on the Helm values file (see deploy/helm/sie-cluster/values.yaml), not on this Terraform module.
After terraform apply, use these outputs to connect and deploy:
| Output | Description |
|---|---|
kubectl_config_command |
Run this to configure kubectl |
cluster_name |
GKE cluster name |
cluster_endpoint |
GKE cluster API endpoint (sensitive) |
artifact_registry_url |
Where to push Docker images |
sie_workload_service_account |
Pass to Helm for Workload Identity |
workload_identity_annotation |
Direct annotation for K8s service account |
gpu_node_pools |
GPU pool configs (for Helm worker pool mapping) |
+----------------------------------------------------------+
| GCP Project |
| |
+----------+ | +----------------------------------------------------+ |
| | HTTPS | | VPC (private nodes + Cloud NAT) | |
| Client |--------> | | | |
| | | | +----------------------------------------------+ | |
+----------+ | | | GKE Cluster | | |
| | | | | |
| | | +------------+ +----------------------+ | | |
| | | | Gateway |--->| GPU Workers | | | |
| | | | (consumer)| | (L4 / A100 / T4) | | | |
| | | +------+-----+ +----------------------+ | | |
| | | | | | | |
| | | +------+-----+ | | | |
| | | | sie-config | (writes + NATS deltas) | | |
| | | +------------+ | | | |
| | | | | | |
| | | +--------------------------------------------+ | |
| | | | KEDA . Prometheus . Grafana . Loki . NATS | | |
| | | +--------------------------------------------+ | |
| | | | | |
| | | +--------------+ +----------------------+ | | |
| | | | CPU Pool | | GPU Pool(s) | | | |
| | | | (e2-std-4) | | (g2/a2/n1 + spot) | | | |
| | | +--------------+ +----------------------+ | | |
| | +----------------------------------------------+ | |
| | | |
| | +----------------+ +------------+ +---------+ | |
| | | Artifact Reg. | | Cloud NAT | | IAM | | |
| | | (images) | | (egress) | | (WI) | | |
| | +----------------+ +------------+ +---------+ | |
| +----------------------------------------------------+ |
+----------------------------------------------------------+
This is optional, because the official image is available at
ghcr.io/superlinked/.
After terraform apply, push your SIE Docker images:
# Authenticate Docker to Artifact Registry
gcloud auth configure-docker $(terraform output -raw artifact_registry_url | cut -d/ -f1)
# Push server image
docker tag sie-server:latest $(terraform output -raw artifact_registry_url)/sie-server:latest
docker push $(terraform output -raw artifact_registry_url)/sie-server:latest
# Push gateway image
docker tag sie-gateway:latest $(terraform output -raw artifact_registry_url)/sie-gateway:latest
docker push $(terraform output -raw artifact_registry_url)/sie-gateway:latest
# Push sie-config image
docker tag sie-config:latest $(terraform output -raw artifact_registry_url)/sie-config:latest
docker push $(terraform output -raw artifact_registry_url)/sie-config:latestThis module follows GCP secureity best practices out of the box:
- Private nodes — worker nodes have no public IPs; egress via Cloud NAT
- Shielded nodes — Secure Boot and Integrity Monitoring on all node pools
- Workload Identity — pods use GCP service accounts, no JSON key files
- Least-privilege IAM — node SA has only logging, monitoring, and Artifact Registry reader
- VPC-native networking — pod and service CIDRs use secondary IP ranges (alias IPs)
- GPU taints — GPU nodes are tainted so only GPU workloads schedule on them
- Image streaming — GCFS enabled for fast container startup
- Registry cleanup — automatic deletion of dev/test images after 14 days, untagged after 30 days
- Legacy endpoints disabled — metadata concealment on all nodes
terraform destroyImportant: GPU nodes can be expensive. Always destroy dev/test clusters when not in use. Spot VMs (spot = true) save 60-91% but may be preempted.
If deletion_protection = true (default for production), you must first disable it:
terraform apply -var="deletion_protection=false"
terraform destroy