引用本文: | 孙正龙,陈威翰,耿鑫地,等.基于深度强化学习的电力系统暂态稳定快关汽门紧急控制策略[J].电力系统保护与控制,2025,53(19):175-187.[点击复制] |
SUN Zhenglong,CHEN Weihan,GENG Xindi,et al.Fast valving emergency control strategy for power system transient stability based on deep reinforcement learning[J].Power System Protection and Control,2025,53(19):175-187[点击复制] |
|
摘要: |
快关汽门是提升电力系统暂态稳定性的经典控制方式之一,但其控制变量具有高维度、离散化的特点,且参数整定不合理将引发功角后续摇摆失稳,控制策略制定的复杂性致使快关汽门难以在线应用与实时决策。为此,提出了基于深度强化学习的快关汽门控制决策方法。首先,构建基于深度强化学习的紧急快关汽门决策制定框架。然后,将快关汽门控制问题转化为马尔可夫决策过程(Markov decision process, MDP),以综合考虑最优稳定控制效果及最小化稳控代价为目标设置奖励函数,并采用近端策略优化(proximal policy optimization, PPO)算法求解,得到快关策略的合理配置。最后,通过改进的电科院SG-77系统验证所提方法的有效性。仿真结果表明所提方法在保证快关汽门策略有效性与时效性的同时,可实现在预案式失配场景下作出正确决策,提高了电力系统的暂态稳定性和动态响应能力。 |
关键词: 快关汽门决策 深度强化学习 暂态稳定 近端策略优化算法 |
DOI:10.19783/j.cnki.pspc.241593 |
投稿时间:2024-11-29修订日期:2025-06-05 |
基金项目:国家自然科学基金项目资助(52277084);吉林省国际科技合作项目资助(20230402074GH) |
|
Fast valving emergency control strategy for power system transient stability based on deep reinforcement learning |
SUN Zhenglong1,CHEN Weihan1,GENG Xindi2,WANG Sixuan1,YANG Hao1,PAN Chao1,CAI Guowei1 |
(1. Key Laboratory of Modern Power System Simulation and Control & Renewable Energy Technology, Ministry of
Education (Northeast Electric Power University), Jilin 132012, China; 2. Hengshui Power Supply Branch,
State Grid Hebei Electric Power Co., Ltd., Hengshui 053000, China) |
Abstract: |
Fast valving is one of the classic control methods to improve transient stability in power systems. However, its control variables are high-dimensional and discrete, and improper parameter tuning may trigger subsequent power angle oscillations and instability. The complexity of the control strategy development makes it difficult to apply fast valving closure in online and real-time decision-making. To address this challenge, a fast valving control decision method based on deep reinforcement learning is proposed. First, a deep reinforcement learning-based emergency fast valving decision-making framework is constructed. Then, the fast valving control problem is transformed into a Markov decision process (MDP). A reward function is designed to balance optimal stability control performance and minimized control cost, and the proximal policy optimization (PPO) algorithm is used to solve it, yielding a rational configuration of the fast valving strategy. Finally, the effectiveness of the proposed method is verified using the improved SG-77 system developed by CEPRI. Simulation results show that the proposed method ensures both the effectiveness and timeliness of the fast valving strategy, enabling correct decision-making under mismatched contingency scenarios, and improves transient stability and dynamic response capability of power systems. |
Key words: fast valving decision-making deep reinforcement learning transient stability proximal strategy optimization algorithm |