CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
ByteDance Seed + Tsinghua AIR (SIA-Lab), 2026
cuda-agent.github.io
Writing fast GPU kernels is genuinely hard. You need to understand memory hierarchy, warp scheduling, bank conflicts, tensor core layouts, and about fifty other microarchitectural details that change between GPU generations. Most engineers — including most ML engineers — don't have this knowledge. They use libraries (cuBLAS, cuDNN, FlashAttention) and hope for the best.
