Partition and cluster BigQuery tables with Airbyte and dbt

Learn how to modify the dbt code used by Airbyte to partition and cluster BigQuery tables.

Tuan Nguyen
10 min readOct 16, 2021
Photo by Roman Mager on Unsplash

Originally published here.

As your business grows, the size of your BigQuery tables will inevitably grow as well. Scanning entire tables to answer business questions will become slower and more expensive as a result. Fortunately, modern data warehouses provide options to limit how much data each query scans.

In this article, you will first learn how to use Airbyte to load data from any source to BigQuery. Then, you will learn how to customize the dbt models used by Airbyte to partition and cluster BigQuery tables.

What is a partitioned table?

A partitioned table is a kind of table that stores your data in smaller chunks (partition) based on the value of a column. For example, if your table contains all of your company’s orders, one partition could be one of the dates. Storing data this way makes it faster to retrieve data as the query engine does not have to scan the entire table, which can be very large. Since BigQuery charges you based on the bytes scanned for any query, it can be significantly cheaper to use a partitioned table.

--

--