Intended audience Developers of source control systems like Git or Mercurial.
Origin
Mood Mildly interested.

There was a recent source code leak from Yandex. I haven’t examined any of the files, but the topic itself reminds us that Yandex maintains a large monorepo, and has even built their own source control system to handle it, called Arc.

Original article from Yandex (2020): https://habr.com/ru/company/yandex/blog/482926/

Brief notes (originally posted to Discord):

  • Seems to be based on SVN for the back-end
  • Trunk-based development
  • 6M commits, 2M files, 2TB repo size
  • Tried Mercurial but didn’t solve performance problems
  • Uses generation numbers for merge-base calculation
  • Probably based on Git for the front-end UI, but they complain about Git’s UI being bad, so they’re improving it
  • Used by 20% of developers internally at the time of writing
  • Uses a virtual filesystem (VFS) heavily (FUSE on macOS, possibly they’ve changed since then?)
    • VFS support on macOS is fairly flaky these days.
  • Uses Yandex Database (YDB) for the back-end database, with some kind of conversion tool from SVN
  • As part of the code review system, Arc commits are eventually converted to SVN commits, including some additional Arc metadata
  • Implicitly uses a working copy commit for some internal algorithms, which includes untracked files since they’re providing a VFS

Overall, it doesn’t seem like there’s a whole lot to advance the state-of-the-art in monorepo management compared to large tech companies like Google and Meta.

Comments