数学科学学院学术报告
A Memory Efficient Randomized Subspace Optimization Method for Training
袁坤
(北京大学)
报告时间:2025年11月11日 星期二 上午10:30-11:30
报告地点:沙河校区E806
报告摘要:The memory challenges associated with training Large Language Models (LLMs) have become a critical concern, particularly when using the Adam optimizer. To address this issue, numerous memory-efficient techniques have been proposed, with GaLore standing out as a notable example designed to reduce the memory footprint of optimizer states. However, these approaches do not alleviate the memory burden imposed by activations, rendering them unsuitable for scenarios involving long context sequences or large mini-batches. Moreover, their convergence properties are still not well-understood in the literature. In this work, we introduce a Randomized Subspace Optimization framework for pre-training and fine-tuning LLMs. Our approach decomposes the high-dimensional training problem into a series of lower-dimensional subproblems. At each iteration, a random subspace is selected, and the parameters within that subspace are optimized. This structured reduction in dimensionality allows our method to simultaneously reduce memory usage for both activations and optimizer states. We establish comprehensive convergence guarantees and derive rates for various scenarios, accommodating different optimization strategies to solve the subproblems. Extensive experiments validate the superior memory and communication efficiency of our method, achieving performance comparable to GaLore and Adam.
报告人简介:袁坤,现任北京大学前沿交叉研究院助理教授,研究员,博士生导师,北京大学博雅青年学者。他于2019年在美国加州大学洛杉矶分校获得博士学位,并在2019年至2022年在阿里巴巴达摩院美国西雅图研究中心任高级算法专家。袁坤主要研究分布式优化及其在大模型中的应用。他在2018年获得IEEE信号处理协会青年作者最佳论文奖。相关成果被集成于阿里巴巴“敏迭”优化求解器和英伟达DeepStream官方软件库。
邀请人: 谢家新