Skip to main content
When running dbt on BigQuery, it’s difficult to know which models drive the most cost. By default, BigQuery billing exports show project-level and dataset-level spend, but offer no model-level granularity. This guide shows you how to use BigQuery labels and dbt macros to break down processing and storage costs by model, team, package, or any custom dimension.

Prerequisites

Output

  • Cost visibility broken down by dbt model, team, package, or any custom label.
  • Ability to identify which models drive auto-scaling or high processing costs.
  • Foundation for cost allocation and optimization.

Steps

1

Label your dbt operations in BigQuery

Configure automated labels in your dbt_project.yml so every query is tagged with dbt native metadata:
query-comment:
  comment: "{{ bq_labels(node, var) }}"
  job-label: True
Then create a macro bq_labels to populate the labels automatically. Here is all the information you can use:
ObjectPropertyDescription
Nodenode.original_file_pathThe original file path of the node
Nodenode.nameThe name of the node
Nodenode.resource_typeThe resource type of the node (model, test, seed, snapshot, table)
Nodenode.package_nameThe dbt package name
Nodenode.databaseFor BigQuery, this is the project name of the project on which you write
Nodenode.schemaThe dbt schema of the node
env_var macroex: env_var(‘username’,‘cloud_run_task’)Any env var available at runtime
var macroex: var(‘customer’, ‘test’)Any dbt variable available at runtime
targettarget.get(‘profile_name’)The profile name of the target (useful if you split per target your ingested / full refresh / backfill jobs)
targettarget.get(‘target_name’)The target name of the target
{% macro bq_labels(node, var) %}
    {%- set comment_dict = {} -%}
    {%- do comment_dict.update(
        app='background_tasks',
        customer=var('customer', 'test'),
        repository="analytics",
        processed_by=env_var('username','cloud_run_task'),
        profile_name=target.get('profile_name'),
        target_name=target.get('target_name')
    ) -%}
    {%- if node is not none -%}
      {%- do comment_dict.update(
        file=node.original_file_path,
        node_id=node.unique_id,
        node_name=node.name,
        resource_type=node.resource_type,
        package_name=node.package_name,
        database=node.database,
        schema=node.schema
      ) -%}
    {%- endif -%}
    {% do return(tojson(comment_dict)) %}
{% endmacro %}
This macro will label every query run during the dbt run.
2

Label your storage resources

In BigQuery, storage costs are visible at the dataset level using resource_id / resource_name, but they are not visible at the billing level per table. To add labels at the table level, apply them in your dbt model config:
{{
  config(
    tags=["ingestion"],
    materialized='table',
    labels={'maintained_by': 'team_A'}
  )
}}
You can also set labels at the dbt_project.yml level. Note that project-level labels will override model-level labels if both are set.
3

Visualize your labeled job costs without Costory

Use BigQuery’s INFORMATION_SCHEMA to analyze costs by label:
SELECT
    ARRAY((SELECT value FROM UNNEST(labels) WHERE key = "label_name"))[SAFE_OFFSET(0)] AS label,
    SUM(total_slot_ms) AS total_slot_ms,
    SUM(total_bytes_billed) AS total_bytes_billed
FROM `<your_project>`.`region-eu`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE EXTRACT(DATE FROM creation_time) = CURRENT_DATE()
    AND ARRAY((SELECT value FROM UNNEST(labels) WHERE key = "label_name"))[SAFE_OFFSET(0)] IS NOT NULL
GROUP BY 1
To query storage-level labels on tables:
SELECT
    *,
    ARRAY((SELECT AS STRUCT json[0] AS key, json[1] AS value
           FROM UNNEST(ARRAY(
               (SELECT AS STRUCT JSON_VALUE_ARRAY(CONCAT('[', elements, ']')) AS json
                FROM UNNEST(regexp_extract_all(option_value, r'STRUCT\("[^"]+", "[^"]+"\)')) elements)
           )))) AS labels
FROM
    <your_project>.INFORMATION_SCHEMA.TABLE_OPTIONS
WHERE option_name = 'labels'
4

Visualize your labeled job costs with Costory

  1. Ingest the labels into Costory using the feature engineering UI: Costory feature engineering screen showing how to ingest BigQuery labels from dbt models
  2. If part of your BigQuery labels should be merged within an existing label, use the merge feature to combine two labels into one (e.g., k8s_label_app and app).
  3. Create a virtual dimension to allocate the cost, or explore costs directly in the Cost Explorer: Costory Cost Explorer showing BigQuery costs broken down by dbt model, team, and package labels

Best practices

Labels

  • Ingest node_name for fine-grained cost investigation to understand which exact model is driving cost.
  • Use package_name or schema to map each schema to the correct team or owner. You can formalize this mapping with virtual dimensions.
  • Use target_name to split daily jobs, full refreshes, and backfills.

Quotas

Alerts

  • Set an alert close to the daily average on ingestion costs. Daily ingestion jobs should be very stable.
  • For full refresh jobs, evaluate alerts over a larger period to avoid false positives from spikes.

Storage costs

  • Evaluate storage costs frequently. After 90 days the storage price decreases, but if you never read the data you should rely on an expiration policy instead.
  • Since 2023, you can choose per dataset the storage billing model:
    • Logical storage: you pay based on the logical size (before compression).
    • Physical storage: you pay based on the physical size (after compression).
You can compare both costs using the following SQL query. Run it per region and change the region-us suffix:
DECLARE active_logical_gib_price FLOAT64 DEFAULT 0.02;
DECLARE long_term_logical_gib_price FLOAT64 DEFAULT 0.01;
DECLARE active_physical_gib_price FLOAT64 DEFAULT 0.04;
DECLARE long_term_physical_gib_price FLOAT64 DEFAULT 0.02;
WITH storage_sizes AS (
SELECT
  table_schema AS dataset_name,
  SUM(active_logical_bytes) / POWER(1024, 3) AS active_logical_gib,
  SUM(long_term_logical_bytes) / POWER(1024, 3) AS long_term_logical_gib,
  SUM(active_physical_bytes) / POWER(1024, 3) AS active_physical_gib,
  SUM(long_term_physical_bytes) / POWER(1024, 3) AS long_term_physical_gib,
  SUM(time_travel_physical_bytes) / POWER(1024, 3) AS time_travel_physical_gib,
  SUM(fail_safe_physical_bytes) / POWER(1024, 3) AS fail_safe_physical_gib
FROM
  `region-us`.INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT
WHERE table_type = 'BASE TABLE'
GROUP BY 1
)
SELECT
  dataset_name,
  ROUND((active_logical_gib * active_logical_gib_price) + (long_term_logical_gib * long_term_logical_gib_price), 2) AS monthly_cost_logical,
  ROUND(
    (active_physical_gib * active_physical_gib_price) +
    (long_term_physical_gib * long_term_physical_gib_price) +
    ((time_travel_physical_gib + fail_safe_physical_gib) * active_physical_gib_price), 2
  ) AS monthly_cost_physical,
  ROUND(SAFE_DIVIDE(active_logical_gib + long_term_logical_gib, active_physical_gib + long_term_physical_gib), 2) AS compression_ratio
FROM
  storage_sizes
ORDER BY
  monthly_cost_logical DESC;
Tabular data with frequent values usually compresses well, so the physical cost is typically lower than the logical cost. If you find a dataset that would be significantly cheaper under the physical model:
ALTER SCHEMA my_dataset SET OPTIONS(storage_billing_model = 'physical');
Changes take about 24 hours to affect billing. Once you switch, you must wait 14 days before you can switch back.

Next steps

FAQ

dbt supports job-level labels via query-comment in dbt_project.yml and table-level labels via the labels property in model config(). Job labels appear in INFORMATION_SCHEMA.JOBS_BY_PROJECT, while table labels appear in INFORMATION_SCHEMA.TABLE_OPTIONS.
Physical storage costs are the costs of the data stored in BigQuery. Logical storage costs are the costs of the data stored in BigQuery after compression. You can compare both costs using the SQL query above. Usually tabular data with frequent values compresses well, so the physical cost is typically lower than the logical cost.
Use GCP’s custom quotas feature with QueryUsagePerUserPerDay to limit bytes billed per user or service account per day.
Yes. Labels set in dbt_project.yml at the project level will override labels set in individual model config() blocks if both define the same key.
When using slots, you will need to reattribute the slots costs to each query. You will need to do this using Virtual Dimensions. A native connector is under development to do it natively.
Last modified on March 18, 2026