Prerequisites:
- You are using dbt to orchestrate your BigQuery workloads.
Output:
- Cost visibility broken down by dbt model, team, package, or any custom label.
- Ability to identify which models drive auto-scaling or high processing costs.
- Foundation for cost allocation and optimization.
Steps:
Step 1: Label your dbt operations in BigQuery
Configure automated labels in your Then create a macro
This macro will label every query run during the DBT run.
dbt_project.yml so every query is tagged with DBT native metadata:bq_labels to populate the labels automatically: here you can choose what’s your label strategy: Here is all the information you can use:| Object | Property | Description |
|---|---|---|
| Node | node.original_file_path | The original file path of the node |
| Node | node.name | The name of the node |
| Node | node.resource_type | The resource type of the node (model, test, seed, snapshot, table) |
| Node | node.package_name | The DBT package name |
| Node | node.database | For BigQuery it’s the project name of the project on which you write |
| Node | node.schema | The DBT schema of the node |
| env_var macro | ex: env_var(‘username’,‘cloud_run_task’) | Any env var available at runtime |
| var macro | ex: var(‘customer’, ‘test’) | Any DBT variable available at runtime |
| target | target.get(‘profile_name’) | The profile name of the target (useful if you split per target your ingested / full refresh / backfill jobs) |
| target | target.get(‘target_name’) | The target name of the target |
Step 2: Label your storage resources
In BigQuery storage costs are visible at dataset level using You can also set labels at the
resource_id / resource_name. they’re not visible at billing level per table. To automatically add labels at the table level, you can use the following configuration:
Apply labels at the model level in dbt to tag BigQuery tables:dbt_project.yml level. Note that project-level labels will override model-level labels if both are set.Step 3: Visualize your labeled job costs without Costory:
Use BigQuery’s To query storage-level labels on tables:
INFORMATION_SCHEMA to analyze costs by label:Step 4: Visualize your labeled job costs with Costory:
- Ingest the labels into Costory using feature engineering UI:

- If part of your BigQuery labels should be merged within an existing label, you can use the merge feature to merge two labels into one: for example k8S_label_app and app.
- Create a virtual dimension to correctly allocate the cost or directly explore costs within the explorer:

What works best ?
In terms of labels:Important Restrictions:
- Ingest node_name for fine grained cost investigation: to understand which exact model is driving the cost.
- Use package name or schema to map each schema to the correct team / owner.
- Use target name to split per target daily jobs / full refresh / backfill jobs.
- set a quota on GCP about query bytes billed per service account / per user
QueryUsagePerUserPerDay
- set an alert very close to the daily average on the ingestion costs: ingestion daily job should be very close to the daily average.
- For full refresh jobs, alerts should be performed over a large period to avoid spikes effects.
- Don’t forget to evaluate frequenlty tre storage costs: after 90 days the storage costs decreases but if you never read the data you should instead rely on expiration policy.
-
Since 2023, you can choose per dataset (the storage billing mode)[https://docs.cloud.google.com/bigquery/docs/updating-datasets#update_storage_billing_models]:
- Logical storage based: you pay based on the logical size (so before compression).
- Physical storage based: you pay based on the physical size (after compression).
Usually tabular data with frequent values compress really well so the physical cost is usually lower than the logical cost since it will be stored as an enum
- 24-Hour Delay: Changes take about 24 hours to affect billing.
- 14-Day Lock: Once you switch, you must wait 14 days before you can switch back to the previous model.
