Authentication is one of those things that feels solved — until you inherit a codebase where it isn't. When I started the Leverage OJ rewrite, the auth system was three separate problems wearing a trench coat: a session setup that broke under PM2, a ContestUser concept that had diverged into its own parallel auth universe, and a password hashing scheme that was one config leak away from a full credential dump.
The submission pipeline is the critical path of an Online Judge. A student submits code, it goes into a queue, a worker picks it up, sends it to the judge, waits for results, writes them back. Simple in theory. The original Leverage implementation was a custom queue built on Redis Lists — and it had problems that only showed up when things went sideways.
语言说明 / Language Note: This post is bilingual. Each section appears in English first, followed by a 中文摘要 (Chinese summary). Jump to any section that works for you.
Some projects accumulate debt quietly. Leverage OJ was not one of them — it accumulated it loudly, in the form of a ranking system that froze mid-competition, an auth system that broke under PM2 clustering, a leaderboard that scanned the entire submissions table on every request, and a password hashing scheme that was one config leak away from a full credential dump.
Every codebase has a story. Leverage — the Online Judge platform I've been maintaining — has one too, and it's not pretty. After years of incremental feature additions, quick fixes pushed at midnight, and the occasional "works on my machine" hack making it to production, the codebase had accumulated enough debt to fund a small startup.
When you add 50+ new endpoints to a production application, you don't just have a new application — you have a new attack surface. The Leverage OJ backend rewrite touched nearly every route in the system, introduced a new role hierarchy, and replaced the entire authentication layer. That's exactly the kind of change that creates permission bugs: the kind where access controls that worked in the old system either didn't get ported, or got ported incorrectly.
There's a setting in TypeORM that every developer uses in development and every developer who has run it in production regrets:
synchronize: true
