Half my work is adding a cache

Intended audience	Back-end software engineers, particularly in big tech.
Origin	Experience in big tech working on highly-scalable developer tooling. Sophie Alpert's blog post Materialized views are obviously useful.
Mood	Frustrated. Determined that we should do better as an industry.

Table of contents:

Related posts
Comments

When I was a kid, I wrote a little social networking site for my school. Back then, in 2010, I was surprised to learn that MySQL didn’t seem to support “materialized views”. Of course, I didn’t even know the name for the concept back then; but I understood that the concept should exist.

In her blog post Materialized views are obviously useful, Sophie Alpert points out that incremental view maintenance is still unreasonably difficult for trivial applications. And it’s 2025. In those 15 intervening years, we’ve barely accomplished anything as an industry to make incremental computation more tractable for the masses.

At best, we have DBSP: Automatic Incremental View Maintenance for Rich Query Languages (2022), which is a great paper and formalization, but was also probably realizable in 2010. We have a few startups (Materialize, Feldera) who offer incremental view maintenance as a service. More expansive efforts like Skiplang petered out. This is sad.

I’ve reproduced my Lobste.rs comment on the aforementioned blog post below.

This has been my thesis for a while:

Half of my work in big tech has just been “adding cache” or “removing a cache” in response to scaling latency/throughput requirements.
We absolutely need higher-level primitives for developing incremental systems, particularly in distributed contexts.
Somehow, our best alternative at each point has been to just build another ad-hoc, informally specified, bug-ridden, slow implementation of a build system.

I’m excited for efforts like Feldera (DBSP) to help provide

concretely, a specific implementation of a “build system as a library”, but also
abstractly, a reliable pattern that can be reimplemented across multiple environments where it’s unavailable (kind of like ReactiveX).

I think the main missing piece is that it’s hard to reconcile “persistent state” across many different environments/runtimes in a well-patterned way:

DBSP gives you a formula for effective incremental computation, but I don’t yet know how hard it is to wire it up to arbitrary data sources for input and output.
More specifically, the pattern seems to require that the underlying data source has a linear notion of “time”, which is not how most people have been designing their databases in practice.
- I can’t go take an arbitrary database at work that somebody implemented five years ago and easily ask Postgres for the delta stream.
- I mean, I’m sure it’s possible, but if it’s not dead simple, then I as a random developer am not going to be equipped to do it.

The following are hand-curated posts which you might find interesting.

Date		Title
20 Apr 2020		Monotonicity is a halfway point between mutability and immutability
19 Oct 2022		Build-aware sparse checkouts
02 Oct 2024		Incremental processing with Watchman — don't daemonize that build!
14 Aug 2025		Lithe, less analysis with Datalog
23 Aug 2025	(this post)	Half my work is adding a cache
30 Oct 2025		Datalog DSL detects defective dependency declarations, defanging dodgy development discipline

Want to see more of my posts? Follow me on Twitter or subscribe via RSS.

Related posts

Comments