Setup Airbyte, BigQuery, dbt, Metabase, and everything else you need to run a Modern Data Stack using Terraform.

A Modern Data Stack Architecture (image by author)

What is a Modern Data Stack

The tale of a Data Analyst who evolves into an Analytics Engineer and resources so you can use to be like her.

Image credit

Running 50 commands to generate base models? Writing the same transforms for the 100 times for your base models? This package will streamline this process for you.

Photo by Lenny Kuhne on Unsplash

Everyone talks about real-time data. Nobody knows how to do it. Everyone thinks everyone else is doing it, so everyone claims they are doing it. JK!

Photo by Joshua Sortino on Unsplash

What is real-time data

  • Real-time: sub-second/minute latency, worst accuracy.
  • Near real-time: 1–5 minutes latency, better accuracy.
  • Batch: anywhere above 5 minutes latency from 1 hour, 1 day to 1 week, best accuracy.

Use Terraform to set up infrastructure-as-code for a Data Lake on Google Cloud Platform.

A summarize of what we will be building in this project (image by author)

Create a streaming pipeline using Docker, Kafka, and Kafka Connect

What we are building in this project

What are the steps in building a data warehouse? What cloud technology should you use? How to use Airflow to orchestrate your pipeline?

The architecture for this project

Airflow has been around for a while, but it has gained a lot of traction lately. So what is Airflow? How can you use it? And how to set it up locally and remotely?

Image credit

Why Airflow?

Photo by Jonas Verstuyft on Unsplash

I am thankful

Image credit: We all been there…

Tuan Nguyen

CTO & Board member @Joon Solutions. Check out my website

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store