Multi-Stage Variance-Controlled Gradient Updates: Toward Robust Continual Learning

Title:Multi-Stage Variance-Controlled Gradient Updates: Toward Robust Continual Learning

Authors:Qingjun Zhang, Ji Feng, Yongqiang Xu, Peilin Li and Degang Yang

Conference:PRICAI 2025

Tags:Continual Learning, Gradient Masking and Large Language models

Abstract:

Large Language Models (LLMs) have demonstrated remarkable performance and strong generalization across diverse tasks; however, catastrophic forgetting remains a fundamental challenge in continual learning scenarios. MIGU, a label-free approach, alleviates forgetting by selectively updating parameters based on gradient magnitude, thereby improving adaptability. Despite its effectiveness, MIGU relies heavily on manually tuned mask generation thresholds, which incur significant computational overhead and limit scalability. To address these limitations, this paper proposes MVGU, an improved method employing multi-stage variance-controlled gradient updates. At its core, MVGU optimizes pre-mask vector generation and threshold selection strategies to reduce dependence on empirical hyperparameter tuning inherent in MIGU, enhancing training efficiency. Extensive continual learning experiments on T5-Large and LLaMA3-8B Instruct architectures demonstrate that MVGU achieves comparable or superior performance to MIGU with fewer training iterations. Results indicate that MVGU is an effective continual learning strategy, capable of reducing training overhead, mitigating task interference during continual learning, and strengthening model adaptability in dynamic learning environments.