Hacker News

captaintobs
Dbt – Incremental but Incomplete tobikodata.com

bradleybuda2 hours ago

I really wish data engineers didn't have to hand-roll incremental materialization in 2024. This is really hard stuff to get right (as the post outlines) but it is absolutely critical to keeping latency and costs down if you're going to go all in on deep, layered, fine-grained transformations (which still seems to me to be the best way to scale a large / complex analytics stack).

My prediction a few years back was that Materialize (or similar tech) would magically solve this - data teams could operate in terms of pure views and let the database engine differentiate their SQL and determine how to apply incremental (ideally streaming) updates through the view stack. While I'm in an adjacent space, I don't do this day-to-day so I'm not quite sure what's holding back adoption here - maybe in a few years more we'll get there.

0cf8612b2e1e4 hours ago

Is anyone using SQLMesh in production? I love “lessons learned” tools which have the opportunity to improve core design after seeing the weak points of the initial product in the space. That being said, I hate being an early adopter, so will let others determine if the new tool has an entirely novel set of shortcomings vs dbt.

captaintobsop4 hours ago

There are many teams using SQLMesh in production. Fivetran, Harness, Hopper, Pitchbook to name a few.

You can read some case studies here https://tobikodata.com/harness.html or join Slack to meet with folks to learn more about their experiences.

pdr946 hours ago

Great to see dbt finally rolling out microbatch incremental models! It's a much-needed feature and a step forward for data transformation. Excited to see how this evolves and complements tools like SQLMesh. Keep up the good work!

captaintobsop6 hours ago

Thanks! Yes, it's a much requested feature but it's difficult to get right!

whinvik2 hours ago

Can someone who understands it explain what dbt is and how it is used. I hear a lot about it but I just haven't figured out what it is useful for.

gkapuran hour ago

Basically people are constantly calculating metrics based on existing tables. Think something as simple as a moving average or the sum of two separate columns in a table. Once upon a time you would set up a cronjob and populate these every day as a SQL query in some python or Perl script.

Dbt introduced a language for managing these “metrics” at scale including the ability to use variables and more complex templates (Jinja.)

Then you do dbt run (https://docs.getdbt.com/reference/commands/run) and kapow the metric is populated in your database.

More broadly dbt did two other things: 1. It pushed the paradigm from ETL to ELT (so stick all the data in your warehouse and then transform it rather than transform it at extraction time.) 2. It created the concept of an “analytics engineer” (previously know as guy who knows SQL or business analyst.)

[deleted]6 minutes agocollapsed

tiew9Vii2 hours ago

Some opinionated conventions around defining templated SQL queries in YAML files for ETL.

Then it provides additional tooling around that, GUI’s, governance, everything your average large corporate asks for.

bitlad2 hours ago

I am not sure if it is that popular these days. Couple of years ago it was pretty popular.

hn-front (c) 2024 voximity
source