Hacker News

amrutha_
Show HN: Structured, GitHub App for Automated DBT PR Reviews github.com

Scaling data teams today means dealing with the complexity of the modern data stack. While DBT has become a core tool for transforming raw data into structured, analytics-ready tables, most teams are using it in ways that lead to chaos: duplicated models, inconsistent metrics, and inefficient SQL that directly impacts cloud spend. The real issue isn’t with DBT itself—it’s in how it’s applied across teams.

Here’s the typical setup: Finance defines a revenue model, Marketing calculates customer lifetime value, and Product defines churn. All in DBT, but all with slightly different logic, leading to metric fragmentation. This results in data drift, conflicting reports, and a ton of unnecessary engineering time spent reconciling definitions. Worse, engineers end up re-inventing the wheel by duplicating logic that already exists in other models. The inefficiencies don’t stop there: suboptimal SQL patterns (e.g., full-table scans, poor joins) creep into production and drive up cloud costs.

We designed our GitHub App to automate the grunt work of DBT model management, focusing on three key areas: preventing redundant logic, maintaining the semantic layer, and optimizing SQL performance.

---

(1) Stop Redundant Models: A lot of teams waste time rebuilding models that already exist. Engineers aren’t aware of what’s been built, so they duplicate work. Our app automatically reviews pull requests, flags redundant models, and suggests reusing existing logic. This keeps your key metrics like revenue or churn consistent across teams and prevents conflicting reports.

(2) Maintain the Semantic Layer: DBT’s value is in creating a semantic layer—a consistent definition of business metrics. But as teams scale, maintaining this layer gets tricky. People unknowingly break it with small changes, leading to inconsistencies. Our app checks every new model for deviations from the semantic layer, flagging inconsistencies before they go live. This prevents those all-too-common situations where two departments are debating whose revenue number is right. By ensuring everyone’s using the same definitions, you avoid trust issues with the data.

(3) SQL Performance = Real Costs: Bad SQL isn’t just a performance problem—it’s a cost problem. Inefficient joins, full-table scans, and poorly written SQL in your DBT models can blow up your cloud bill. Our app reviews SQL in pull requests, flags inefficiencies, and suggests optimizations. Example: An engineer submits a model that joins two large tables without filtering. Our app flags the full-table scan and suggests using indexed columns and adding WHERE filters. This reduces query cost and improves performance before the code hits production.

---

Data engineers are already stretched thin with the demands of modern data pipelines. By automating model consistency checks, semantic layer enforcement, and SQL performance reviews, our GitHub App frees up your team to focus on higher-impact work rather than wasting cycles on repetitive tasks or fighting fires caused by bad data logic.

The app is live—give it a try, and let us know how it’s improving your workflow. Also, keep an eye out for our upcoming DBT code generation features—we’re automating more of the heavy lifting soon.


bobbypage2 days ago

Having an automated PR review for DBTs will be very valuable to ensure DBTs and SQL are following the best practices.

I feel like SQL and data is always a second class citizen compared to code review bots / the vast ecosystem of linters and review tools. As a result, it's great to see that DBT and SQL models will benefit here as well!

Does the Github app integrate with my existing DBT schemas and provide customized recommendations depending on my data models, metrics, etc?

akshayc2 days ago

Super important product! I’ve experienced this pain point firsthand many times!

hn-front (c) 2024 voximity
source