Skip to main content

Prerequisites

  • You have a Kubernetes cluster running (AWS EKS or GCP GKE).
  • You have a provider connected to Costory with your billing data ingested.
  • You have labels / tags applied (or plan to apply them) to your pod deployments.
Azure AKS support is coming soon.

Output

  • Granular cost visibility at the Kubernetes label level (team, app, environment, etc.) without any third-party agent.
  • Ability to track waste ratio per node pool and identify over-provisioned clusters.
  • Foundation for cost allocation and chargeback across teams.

How waste is defined in Kubernetes

In Kubernetes, your pods request a certain amount of CPU, memory, and GPU. The scheduler then decides on which node(s) to run your pods. Two concepts matter here:
  • Requests: the amount of resources reserved by the pod. The scheduler uses this to pick the right node(s).
  • Limits: the maximum resources a pod can consume before getting killed by the kernel (OOM killer).
Waste is the gap between available capacity and what pods have claimed:
waste = available_resources − requests
Waste does not take into account the limits of the pods ! It’s only based on the requests, since requests are used to schedule nodes not limits.

Example

Consider a cluster with two nodes and three pods:
CPUMemory
Node 1 (available)2 CPU4 GB
Node 2 (available)4 CPU4 GB
Total available6 CPU8 GB
CPU requestMemory request
Pod 11 CPU2 GB
Pod 21 CPU2 GB
Pod 31 CPU1 GB
Total requested3 CPU5 GB
Cluster overview This gives us: waste CPU = 6 − 3 = 3 CPU and waste memory = 8 − 5 = 3 GB.

Converting waste to cost

Since you pay per machine (not per core or per GB), we need a ratio to split the cost between CPU and memory. We use a 9:1 ratio in favor of CPU — this reflects that, on most AWS and GCP instance families, CPU is roughly 9× more expensive per unit than memory. You can verify this by comparing on-demand pricing for common instance types. From there, it’s useful to compute waste at two levels:
  • Cluster level: get an overall view of waste.
  • Node pool level: if you rely on affinity rules, this helps identify which node pools are underutilized.
Some providers compute waste from actual usage (possible in Costory via AWS Managed Prometheus). The downside is that per-namespace costs become unstable and harder to compare over time, since you pay for the nodes you schedule — not for what pods actually consume.

Steps

1

Set up label visibility

Follow the EKS / ECS cluster cost visibility guide to enable split cost allocation and add labels to your pod deployments.
2

Break down costs by label

Once your pods have labels (team, app, environment, etc.), use Costory’s Feature Engineering to merge these labels with your billing data. This lets you split cluster costs per label for cost allocation and chargeback.This allow to fix the automatic renaming performed by GCP (k8s_label_<label_name>).
3

Monitor the waste over time

Track your waste ratio over time and get notified when it exceeds acceptable thresholds. In Costory, create an alert using the Kubernetes Waste template:Create a Kubernetes Waste alert
4

Reallocate waste to teams

Waste reallocation is currently available for EKS only.
Costory supports two reallocation strategies:
StrategyHow it worksBest for
Namespace reallocation (default)Calculates per-hour usage requested by each namespace and reallocates waste proportionally.Most clusters with standard scheduling.
Node-based reallocationCalculates per-hour resource requests per pod on each individual node and reallocates waste accordingly. A pod alone on a node with take its full cost during the period is alone.Clusters relying heavily on affinity rules (node pools, zones, etc.) where a single pod may trigger a new node. Trade-off: more variance in waste over time.
By default, namespace reallocation is enabled. Reach out to us to enable node-based reallocation.

What’s next?

  • Set up a weekly Slack report to share waste metrics with your team.
  • Add your Kubernetes labels to the digest tree for automated cost-change notifications.
  • Explore your data in the Cost Explorer to identify faulty deployments over-requesting resources.
Last modified on February 17, 2026