The ability of nuclear reactors to operate their power conversion cycles more flexibly will enhance their value to energy grids with variable pricing. Current nuclear control systems are typically classical controllers that are often based on proportional-integral-derivative (PID) control. This paper presents a method of augmenting the existing PID control for difficult transient operations in nuclear power plants using a reinforcement learning–derived feedforward signal applied in real time. The agents, which are trained on a test thermal load-following problem, are designed to improve steam generator outlet temperature control for a range of fast load-following scenarios covering ramp rates from 9%/min to 15%/min.

Several reinforcement learning algorithms were initially investigated for the training of the feedforward agents with deep Q-learning (DQN) and proximal policy optimization (PPO) networks, which were found to be the most promising. The DQN controllers utilize discrete actions, giving them a better disturbance rejection at steady state but inconsistent response to initial temperature deviations. In contrast, PPO-trained agents, which take continuous actions except for a dead zone around zero, were shown to have the best combination of high disturbance rejection at steady state and good tracking of the desired temperature value. The ability of the PPO agent was also examined, with the average time of decision making found to be on the order of 1 ms.

The fault properties of the controller under the loss of the reinforcement learning agent feedforward signal were also examined. The controller showed strong performance in situations of “no-signal” faults. but was less good at handling “stuck-at” faults, where the feedforward signal remains at a set value. In both cases, however, the PID was able to successfully maintain stability, eventually returning the system to a steady state. It is hoped that this work will allow for the proposed control architecture to be examined for more difficult control problems such that it may eventually be used to adapt existing nuclear plants for more aggressive load-following on grids of the future.