Matthew Denman: On Probabilistic Risk Assessment

September 24, 2021, 3:32PMNuclear News

Matthew Denman

Probabilistic risk assessment is a systematic methodology for evaluating risks associated with a complex engineered technology such as nuclear energy. PRA risk is defined in terms of possible detrimental outcomes of an activity or action, and as such, risk is characterized by three quantities: what can go wrong, the likelihood of the problem, and the resulting consequences of the problem.

Matthew Denman is principal engineer for reliability engineering at Kairos Power and the chair of the American Nuclear Society and American Society of Mechanical Engineers Joint Committee on Nuclear Risk Management’s Subcommittee of Standards Development. As a college student at the University of Florida, Denman took a course on PRA but didn’t enjoy it, because he did not see its connection to the nuclear power industry. Later, during his Ph.D. study at the Massachusetts Institute of Technology, his advisor was Neil Todreas, a well-known thermal hydraulics expert. Todreas was working on a project with George Apostolakis, who would leave MIT to become a commissioner of the Nuclear Regulatory Commission. The project, “Risk Informing the Design of the Sodium-Cooled Fast Reactor,” was a multi-university effort funded through a Department of Energy Nuclear Energy Research Initiative (NERI) grant. Todreas and Apostolakis were joined in this project by a who’s who of nuclear academia, including Andy Kadak (MIT, ANS past president [1999–2000]), Mike Driscoll (MIT), Mike Golay (MIT), Mike Lineberry (Idaho State University, former ANS treasurer), Rich Denning (Ohio State University), and Tunc Aldemir (Ohio State University).

Denman had started work on his Ph.D. by concentrating on more traditional reactor design concepts, but he was also briefing the “Risk Informing” project team. Because the idea behind the project was to risk inform the design of a sodium reactor, Denman was exposed to risk concepts. He started taking classes on how to use probabilities and risk insights to solve a range of problems. The light went on for Denman, and he realized that PRA wasn’t something that was an abstract assessment of a system but was part of an integrated decision-making process. Working on the project and taking those classes changed his view of what could be accomplished with PRA. By the time he finished his Ph.D. work, he knew that PRA was what he wanted to work on.

Denman talked about PRA with Rick Michal, NN editor-in-chief.

Is PRA another name for a safety assessment?

The international community would call it probabilistic safety assessment. The International Atomic Energy Agency’s standards are known as PSA instead of PRA. Safety and risk are reciprocal quantities. With safety, it’s ensuring that things don’t break. With risk, it’s asking what happens when they do break. There is a “risk triplet,” as it is called, within PRA: What can go wrong? How likely is it? and What are the consequences? With those three in tandem we can assess how concerned or not someone should be about anything in life, whether it is operating a nuclear power plant or driving a car down the road. PRA is predicting the consequences that can happen and how likely they are to happen so that informed decisions can be made.


Does a deterministic safety analysis figure into PRA?

Yes. Imagine an accident where radionuclides are mobilized and leaking out of containment. It must be shown that the dose thresholds that exist at the site boundary and the low population zone boundary are below regulatory allowance. That understanding is very deterministic. There is no frequency of that event, which is just postulated.

By looking at the frequencies and the consequences of events, you can begin to say that perhaps you should not be as concerned about, for example, the occurrence of a large catastrophic rupture in the reactor coolant boundary as you should be about a valve that could get stuck open. In fact, that’s one of the first big insights that came out of WASH-1400, which was the first integrated risk study in the 1970s and where it was revealed that the nuclear safety community was focused on large-break loss of coolant accidents (LOCAs), but the risk in plant designs was highest for the smaller-break LOCAs. There are more small pipes in a nuclear power plant than large pipes, and large pipes are not likely to catastrophically fail. The small pipes are more likely to fail, such as from the fittings coming loose because small valves get stuck open.

There was a lot of skepticism regarding WASH-1400, but then the Three Mile Island accident happened. TMI was a core damage event caused not by a large-break LOCA but by a small stuck-open-valve LOCA. TMI reinforced what WASH-1400 was saying all along, which was, “If all you’re doing is looking at the worst events possible, you might be missing part of the story.”

What was needed was something that was going to march systematically through the system and what can go wrong, so that appropriate controls could be applied and time wasn’t spent on over-defending against events that were highly unlikely to happen. Resources needed to be devoted to what actually might happen, and that’s where PRA came in.


How is probability assigned to a failure?

That’s a good question and there is not a single answer. For most parts and components of a nuclear power plant, we have years of operating experience. For many parts, statistics on failures are consulted. Each plant has a database of events at operating light water reactors and at that specific plant. As an example, a type of valve might have cycled a million times and experienced only two “valve stuck open” events. We now are able to say how many failures were experienced over how many ­trials, and we now have a failure probability for that event.

There are some components for which we don’t have failure data, but we have seen significant demands on the component. If you think about the exact same process—for example, there were one million cycles of a valve but there has never been a failure—that is still evidence. There might not have been a failure, but you know that a failure probability exists. How were a million trials performed without a single failure? That is where Bayesian statistics come in. There is a big area of disagreement in the statistical community between the Bayesians and the frequentists on how to interpret probabilities and information. A Bayesian would say that there can be a prior understanding of what this failure would be and then it can be updated with new evidence, which can be considered on a quantitative basis as the subjective and the objective. One way of doing that, if there were no failures but a million trials, is to assume half a failure in one million trials. This concept does not exist in frequentist statistics, where the objective is everything. But for rare events subjective information is often needed to calculate a failure probability.

You could go even further. You could say, “I have a million tests of this type of valve but I have a billion tests of a much wider set of valves, some of which are applicable, and I experienced a couple of failures there.” So, you could craft an understanding of general valve failure probabilities from that other source of objective data, which would be updated as new data came in. Bayesian statistical combination of subjective and objective information is a powerful enabler of modern risk assessments.


Is there a chance that there is a “dud” part that fails much earlier than a normal part? What would that do to the PRA calculations?

An example of a bathtub curve.

Yes, you can have a dud. There is a well-known “bathtub curve” where the first part of the curve describes an early mortality rate that is typically higher, because for parts coming off the assembly line, if their tolerances are incorrect, they are more likely to wear out early. And then, continuing on the bathtub curve, there is a low and flat failure rate for most of the parts’ lives. And at the other end of the bathtub curve, as the pieces of equipment start to wear out, the failure rate starts to go up—these parts have exceeded their natural lifespan.

If there is a piece of equipment that fails in the early mortality period, two things can happen, and it really depends on how much data you have. If you have enough data that that early failure doesn’t influence your bottom-line statistics, or if it does but it’s not an important piece of equipment so having a high failure estimate isn’t going to hurt you too much in your bottom-line results, you might just lump this data in with everything else and go along your merry way. If you can’t live with that piece of data, you might segment up your data set to treat your duds separately. Unfortunately, every time you break up your data set, you have less data to work with within each bin. Say I’m going to split the data set into a bin of early equipment failures and then a bin of everything else and then characterize each bin separately. You might model out early failures into a different basic event that goes into your PRA, or you might document a qualitative argument that says, “Note: this data set doesn’t apply to my model because of ‘X’—for example, because we do shake-down testing for early failures before putting this part into the plant because we have increased monitoring on it.” And thus there would be a more immediate response to failures in the early phase. The arguments can vary from component to component depending on how important the component is. In general, if you have data that you can’t live with, you’re going to do something to try to limit its effect on the rest of the data set while ensuring that the data on your duds are addressed. That’s always a balancing act.


What about assessing new technology?

The cutting edge of risk assessment is being pushed on advanced designs where neither the numerator nor the denominator of the failure probability is characterized well. In a situation concerning new technology, there would not be one million trials to use as data. This is where a panel of experts would come in to document all the different things that could fail and to qualitatively assess the failure probabilities, with some simulation data possibly mixed in. Here, your failure probabilities would be dominated by the subjective understanding of the system.


Do plant personnel use PRA to be proactive toward systems?

Yes, it’s kind of a watershed thing that’s happened over the past 15 or so years. Different plants are more proactive than others, and different utilities embrace PRA more than others. PRAs, for an operating plant, can be used to inform a plant’s maintenance intervals. In fact, for a number of plants, every time the maintenance staff is scheduled to do something on plant equipment, there is a pre-job briefing where they do a walk-through of the procedures. They might say something like, “This is our core damage frequency, and these are the components that are the most risk-significant at this point in time in your area of interest. So, don’t do anything that could challenge those components while you’re doing your routine maintenance activities on other parts of the system.” It is impressive how PRA has intertwined itself with the day-to-day operations of nuclear power plants.


How does human error play into PRA?

Operators are an integral part of reactor safety. Human error is an integral part of everything. For example, if I’m driving on the highway, the car is much more likely to crash because I was being careless than because the brakes locked up unexpectedly and someone slammed into the rear of my vehicle. Both are possible, but it’s much more likely that I’m the cause of the car wreck. It’s the same thing at nuclear power plants. Staff at the plants are very well trained, but events do happen. The PRA standards are process standards. They will tell you what to do but not how to do it. The PRA standards have a human reliability section, which has a number of steps to look through—maintenance procedures, historical records, and failures—to identify accident precursors that could exist.

Plant technicians have a certain amount of time to perform an action. Sometimes an action might be to don an anti-contamination suit for protection to travel across the plant and go down three levels to turn a valve to open a flow path to get the water injection working again. The time the technician will take to do all those things is variable. The technician will train and drill on it, and the plant will have data on how long it’s going to take to do the job. But there is always a probability that the technician, for any number of reasons, isn’t going to be able to achieve the task in the appropriate time. That gets into a human error probability for the technician enacting the procedure as written.


Is there a PRA correlation between the nuclear and airline industries because they are higher-risk industries?

The airline industry uses PRAs. NASA used a lot of PRAs back in the day. In fact, in early risk assessment as the field was starting up, it was NASA and the nuclear industry bouncing ideas and thoughts off each other to advance the technology to move forward. For a full risk assessment, where there are fault trees that are the logical constructs of how a system does or doesn’t perform its function, and Venn diagrams that connect all the fault trees in a linear progression as a temporal marching of how an accident progresses—that is reserved for your highly regulated, high-consequence, low-probability systems. A lot of industries employ fault trees as a visual way of representing how failures can propagate up to fail a system.


How does nuclear PRA take into consideration the natural environment?

That happens through a lot of requirements in the PRA standards. The non–light water reactor PRA standard has nine parts, which deal variously with internal events and then external events such as fires, floods, and high winds. The standard considers everything from plumes of sea scallops clogging water intake pipes to forest fires to meteor strikes to airplane impacts. A nuclear plant has to do risk assessments to say what these events would mean for core damage or things of that nature.


Was PRA done for the Fukushima plant?

PRA is applied differently around the world. The United States was the early proponent of PRA in producing risk analysis for our facilities and expanding it to external events. PRAs were done at Fukushima and other sites, but in general, their design was based on deterministic and prescriptive rules for defending against a tsunami-type event. These requirements did identify that external flooding vulnerabilities existed in 2008, but the upgrades to the plant were not implemented in time to prevent the accident. The Japanese nuclear industry is currently doing a lot of work to make their PRAs more robust and expansive to cover different types of external hazards. My understanding is that at the time, the Japanese regulatory structure was focused largely on complying with deterministic requirements.

It is important to recognize that, in a risk-informed framework, PRAs are only a part of the decision-making process. Economics, biases, deterministic safety analyses, performance monitoring, defense in depth—all of that information is combined to make an ultimate decision. For existing plants, for example, simple economic considerations might result in shutdown of a plant in lieu of backfitting new safety requirements.


Is the science of PRA a living science?

Yes, it’s a living science. The PRA standards are constantly being revised. The light water reactor PRA standard originally existed as separate—for internal and external—standards within ANS and ASME. Those societies merged these efforts together to form the Joint Committee on Nuclear Risk Management and in 2007 produced the first light water reactor standard that unified internal and external events. That standard was revised in 2011 and then again in 2013, and currently it’s being revised again and is going through the editing process. The current revision provides more guidance on external hazards.

ANS’s Subcommittee of Standards Development is simultaneously developing standards on advanced light water reactors like the AP1000 and NuScale’s small modular reactor. Given the fact that these new designs rely more on inherent safety, passive safety, natural circulation of water and large pools of water, and things of that nature, the PRA standards are revised to be more applicable to those designs.


Are there PRAs for various contingencies or possible accidents?

Yes, there are three levels of PRAs plus guidance for various other operating modes and externalities. Level 1 considers failures out to core damage and large releases for all internal and external hazards. The next edition of the Level 1 standard is currently going through copyediting and should be released later this year. Level 2 says, given the fact that there is core damage, how much radiation can get out of the plant? Level 3 says, given what gets released from the plant, what does that mean to an off-site individual? There also is a multiunit PRA standard—which Fukushima highlighted the need for—that looks at having multiple units on a site: When an accident occurs, how do the units interact with each other? There is a low-power and shutdown standard that address the unique configurations of the plants when not at full power. There is an advanced light water reactor PRA standard, which adapts the Level 1 standard to address the AP1000s and NuScale. And then the non–light water reactor PRA standard, published in early 2021, is the first integrated Level 1 through 3 PRA—for all internal and external hazards, all modes (at power and low-power or shutdown), and for multiunit sites. There is a lot of ongoing work on various aspects of PRA and what it means to do PRA.


Related Articles

NRC seeks public comment on PRA project

August 24, 2023, 9:30AMNuclear News

In a notice published in the August 21 Federal Register, the Nuclear Regulatory Commission is requesting public input on the latest set of results from a multiyear project to study risk at a...

What is at the forefront of PRA today?

July 20, 2023, 3:00PMNuclear NewsAskin Guler Yigitoglu

Askin Guler Yigitoglu (yigitoglua@ornl.gov) is a reactor systems modeling and safety analysis staff member at ORNL, chair of ANS’s Nuclear Installations Safety Division, and technical...

Human risk factors

July 17, 2023, 3:11PMNuclear News

Human factors engineering and risk analysis are part of every instrumentation and control upgrade and new reactor plan—from design and licensing to implementation and operation—and...