Intended audience: developers of source control systems like Git or Mercurial.
There was a recent source code leak from Yandex. I haven’t examined any of the files, but the topic itself reminds us that Yandex maintains a large monorepo, and has even built their own source control system to handle it, called Arc.
Original article from Yandex (2020): https://habr.com/ru/company/yandex/blog/482926/
Brief notes (originally posted to Discord):
- Seems to be based on SVN for the back-end
- Trunk-based development
- 6M commits, 2M files, 2TB repo size
- Tried Mercurial but didn’t solve performance problems
- Uses generation numbers for merge-base calculation
- This is now available in Git via the commit-graph mechanism.
- Probably based on Git for the front-end UI, but they complain about Git’s UI being bad, so they’re improving it
- Used by 20% of developers internally at the time of writing
- Uses a virtual filesystem (VFS) heavily (FUSE on macOS, possibly they’ve changed since then?)
- VFS support on macOS is fairly flaky these days.
- Uses Yandex Database (YDB) for the back-end database, with some kind of conversion tool from SVN
- As part of the code review system, Arc commits are eventually converted to SVN commits, including some additional Arc metadata
- Implicitly uses a working copy commit for some internal algorithms, which includes untracked files since they’re providing a VFS
- I mentioned this in the context of Jujutsu VCS.
Overall, it doesn’t seem like there’s a whole lot to advance the state-of-the-art in monorepo management compared to large tech companies like Google and Meta.