> ## Documentation Index
> Fetch the complete documentation index at: https://docs.costory.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Increase visibility on your BigQuery costs with dbt

> Track and allocate BigQuery processing and storage costs at the dbt model level using labels, then visualize them in Costory.

When running dbt on BigQuery, it's difficult to know which models drive the most cost. By default, BigQuery billing exports show project-level and dataset-level spend, but offer no model-level granularity. This guide shows you how to use BigQuery labels and dbt macros to break down processing and storage costs by model, team, package, or any custom dimension.

## Prerequisites

* You are using **[dbt](https://www.getdbt.com/)** to orchestrate your BigQuery workloads.
* Your GCP billing export is [connected to Costory](/setup/billing#gcp).

## Output

* Cost visibility broken down by dbt model, team, package, or any custom label.
* Ability to identify which models drive auto-scaling or high processing costs.
* Foundation for cost allocation and optimization.

## Steps

<Steps>
  <Step title="Label your dbt operations in BigQuery">
    Configure automated labels in your `dbt_project.yml` so every query is tagged with dbt native metadata:

    ```yaml theme={null}
    query-comment:
      comment: "{{ bq_labels(node, var) }}"
      job-label: True
    ```

    Then create a macro `bq_labels` to populate the labels automatically. Here is all the information you can use:

    | Object         | Property                                    | Description                                                                                                  |
    | -------------- | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
    | Node           | node.original\_file\_path                   | The original file path of the node                                                                           |
    | Node           | node.name                                   | The name of the node                                                                                         |
    | Node           | node.resource\_type                         | The resource type of the node (`model`, `test`, `seed`, `snapshot`, `table`)                                 |
    | Node           | node.package\_name                          | The dbt package name                                                                                         |
    | Node           | node.database                               | For BigQuery, this is the project name of the project on which you write                                     |
    | Node           | node.schema                                 | The dbt schema of the node                                                                                   |
    | env\_var macro | ex: env\_var('username','cloud\_run\_task') | Any env var available at runtime                                                                             |
    | var macro      | ex: var('customer', 'test')                 | Any dbt variable available at runtime                                                                        |
    | target         | target.get('profile\_name')                 | The profile name of the target (useful if you split per target your ingested / full refresh / backfill jobs) |
    | target         | target.get('target\_name')                  | The target name of the target                                                                                |

    ```sql theme={null}
    {% macro bq_labels(node, var) %}
        {%- set comment_dict = {} -%}
        {%- do comment_dict.update(
            app='background_tasks',
            customer=var('customer', 'test'),
            repository="analytics",
            processed_by=env_var('username','cloud_run_task'),
            profile_name=target.get('profile_name'),
            target_name=target.get('target_name')
        ) -%}
        {%- if node is not none -%}
          {%- do comment_dict.update(
            file=node.original_file_path,
            node_id=node.unique_id,
            node_name=node.name,
            resource_type=node.resource_type,
            package_name=node.package_name,
            database=node.database,
            schema=node.schema
          ) -%}
        {%- endif -%}
        {% do return(tojson(comment_dict)) %}
    {% endmacro %}
    ```

    This macro will label every query run during the dbt run.
  </Step>

  <Step title="Label your storage resources">
    In BigQuery, storage costs are visible at the dataset level using `resource_id` / `resource_name`, but they are not visible at the billing level per table. To add labels at the table level, apply them in your dbt model config:

    ```sql theme={null}
    {{
      config(
        tags=["ingestion"],
        materialized='table',
        labels={'maintained_by': 'team_A'}
      )
    }}
    ```

    You can also set labels at the `dbt_project.yml` level. Note that project-level labels will override model-level labels if both are set.
  </Step>

  <Step title="Visualize your labeled job costs without Costory">
    Use BigQuery's `INFORMATION_SCHEMA` to analyze costs by label:

    ```sql theme={null}
    SELECT
        ARRAY((SELECT value FROM UNNEST(labels) WHERE key = "label_name"))[SAFE_OFFSET(0)] AS label,
        SUM(total_slot_ms) AS total_slot_ms,
        SUM(total_bytes_billed) AS total_bytes_billed
    FROM `<your_project>`.`region-eu`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
    WHERE EXTRACT(DATE FROM creation_time) = CURRENT_DATE()
        AND ARRAY((SELECT value FROM UNNEST(labels) WHERE key = "label_name"))[SAFE_OFFSET(0)] IS NOT NULL
    GROUP BY 1
    ```

    To query storage-level labels on tables:

    ```sql theme={null}
    SELECT
        *,
        ARRAY((SELECT AS STRUCT json[0] AS key, json[1] AS value
               FROM UNNEST(ARRAY(
                   (SELECT AS STRUCT JSON_VALUE_ARRAY(CONCAT('[', elements, ']')) AS json
                    FROM UNNEST(regexp_extract_all(option_value, r'STRUCT\("[^"]+", "[^"]+"\)')) elements)
               )))) AS labels
    FROM
        <your_project>.INFORMATION_SCHEMA.TABLE_OPTIONS
    WHERE option_name = 'labels'
    ```
  </Step>

  <Step title="Visualize your labeled job costs with Costory">
    1. Ingest the labels into Costory using the feature engineering UI:

           <img src="https://mintcdn.com/costory/TpKCQUAZIFXf5bss/use-cases/dbt_bigquery_visibility/0.png?fit=max&auto=format&n=TpKCQUAZIFXf5bss&q=85&s=03d20c622b10a665eb1bbcedb282bfb9" alt="Costory feature engineering screen showing how to ingest BigQuery labels from dbt models" width="2480" height="1660" data-path="use-cases/dbt_bigquery_visibility/0.png" />

    2. If part of your BigQuery labels should be merged within an existing label, use the merge feature to combine two labels into one (e.g., `k8s_label_app` and `app`).

    3. [Create a virtual dimension](/features/tagging/dimensions#virtual-dimensions) to allocate the cost, or explore costs directly in [Explorer](/features/cost-explorer):

           <img src="https://mintcdn.com/costory/TpKCQUAZIFXf5bss/use-cases/dbt_bigquery_visibility/1.png?fit=max&auto=format&n=TpKCQUAZIFXf5bss&q=85&s=54b4d1e9f1d157fcc0ff21759a0551a8" alt="Costory Explorer showing BigQuery costs broken down by dbt model, team, and package labels" width="1699" height="930" data-path="use-cases/dbt_bigquery_visibility/1.png" />
  </Step>
</Steps>

## Best practices

### Labels

* Ingest `node_name` for fine-grained cost investigation to understand which exact model is driving cost.
* Use `package_name` or `schema` to map each schema to the correct team or owner. You can formalize this mapping with [virtual dimensions](/features/tagging/dimensions#virtual-dimensions).
* Use `target_name` to split daily jobs, full refreshes, and backfills.

### Quotas

* Set a quota on GCP for query bytes billed per service account or per user using [`QueryUsagePerUserPerDay`](https://docs.cloud.google.com/bigquery/docs/custom-quotas).

### Alerts

* Set an alert close to the daily average on ingestion costs. Daily ingestion jobs should be very stable.
* For full refresh jobs, evaluate alerts over a larger period to avoid false positives from spikes.

### Storage costs

* Evaluate storage costs frequently. After 90 days the storage price decreases, but if you never read the data you should rely on an expiration policy instead.
* Since 2023, you can choose per dataset the [storage billing model](https://docs.cloud.google.com/bigquery/docs/updating-datasets#update_storage_billing_models):
  * **Logical storage**: you pay based on the logical size (before compression).
  * **Physical storage**: you pay based on the physical size (after compression).

You can compare both costs using the following SQL query. Run it per region and change the `region-us` suffix:

```sql theme={null}
DECLARE active_logical_gib_price FLOAT64 DEFAULT 0.02;
DECLARE long_term_logical_gib_price FLOAT64 DEFAULT 0.01;
DECLARE active_physical_gib_price FLOAT64 DEFAULT 0.04;
DECLARE long_term_physical_gib_price FLOAT64 DEFAULT 0.02;
WITH storage_sizes AS (
SELECT
  table_schema AS dataset_name,
  SUM(active_logical_bytes) / POWER(1024, 3) AS active_logical_gib,
  SUM(long_term_logical_bytes) / POWER(1024, 3) AS long_term_logical_gib,
  SUM(active_physical_bytes) / POWER(1024, 3) AS active_physical_gib,
  SUM(long_term_physical_bytes) / POWER(1024, 3) AS long_term_physical_gib,
  SUM(time_travel_physical_bytes) / POWER(1024, 3) AS time_travel_physical_gib,
  SUM(fail_safe_physical_bytes) / POWER(1024, 3) AS fail_safe_physical_gib
FROM
  `region-us`.INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT
WHERE table_type = 'BASE TABLE'
GROUP BY 1
)
SELECT
  dataset_name,
  ROUND((active_logical_gib * active_logical_gib_price) + (long_term_logical_gib * long_term_logical_gib_price), 2) AS monthly_cost_logical,
  ROUND(
    (active_physical_gib * active_physical_gib_price) +
    (long_term_physical_gib * long_term_physical_gib_price) +
    ((time_travel_physical_gib + fail_safe_physical_gib) * active_physical_gib_price), 2
  ) AS monthly_cost_physical,
  ROUND(SAFE_DIVIDE(active_logical_gib + long_term_logical_gib, active_physical_gib + long_term_physical_gib), 2) AS compression_ratio
FROM
  storage_sizes
ORDER BY
  monthly_cost_logical DESC;
```

Tabular data with frequent values usually compresses well, so the physical cost is typically lower than the logical cost.

If you find a dataset that would be significantly cheaper under the physical model:

```sql theme={null}
ALTER SCHEMA my_dataset SET OPTIONS(storage_billing_model = 'physical');
```

<Warning>
  Changes take about 24 hours to affect billing. Once you switch, you must wait
  14 days before you can switch back.
</Warning>

## Next steps

* [Explore your costs in Explorer](/features/cost-explorer): slice and dice your BigQuery spend with full context.
* [Tag & allocate costs with dimensions](/features/tagging/dimensions): map labels to teams and build cost allocation rules.
* [Allocate shared Cloud SQL costs across teams](/use-cases/allocate_shared_cloud_sql_database/allocate): apply similar labeling strategies to Cloud SQL.
* [Get Kubernetes cost visibility with EKS](/use-cases/eks-cost-allocation): extend cost tracking to your container workloads.

## FAQ

<AccordionGroup>
  <Accordion title="What BigQuery labels does dbt support?">
    dbt supports **job-level labels** via `query-comment` in `dbt_project.yml` and **table-level labels** via the `labels` property in model `config()`. Job labels appear in `INFORMATION_SCHEMA.JOBS_BY_PROJECT`, while table labels appear in `INFORMATION_SCHEMA.TABLE_OPTIONS`.
  </Accordion>

  <Accordion title="What's the difference between physical and logical storage costs?">
    Physical storage costs are the costs of the data stored in BigQuery. Logical storage costs are the costs of the data stored in BigQuery after compression. You can compare both costs using the SQL query above.
    Usually tabular data with frequent values compresses well, so the physical cost is typically lower than the logical cost.
  </Accordion>

  <Accordion title="How do I set a BigQuery cost quota?">
    Use GCP's custom quotas feature with [`QueryUsagePerUserPerDay`](https://docs.cloud.google.com/bigquery/docs/custom-quotas) to limit bytes billed per user or service account per day.
  </Accordion>

  <Accordion title="Will project-level labels override model-level labels?">
    Yes. Labels set in `dbt_project.yml` at the project level will override labels set in individual model `config()` blocks if both define the same key.
  </Accordion>

  <Accordion title="Can I track BigQuery costs per query when using Slots?">
    When using slots, you will need to reattribute the slots costs to each query. You can do this using [virtual dimensions](/features/tagging/dimensions#virtual-dimensions). A native connector is under development to do it natively.
  </Accordion>
</AccordionGroup>
