Nuclear Science and Engineering -- ANS / Publications / Journals / Nuclear Science and Engineering

Experiences with vectorization of production-level Monte Carlo codes such as KENO-IV, MCNP, VIM, and MORSE have shown that it is difficult to attain high speedup ratios on vector processors because of indirect addressing, nests of conditional branches, short vector length, cache misses, and operations for realization of robustness and generality. A previous work has already shown that the first, second, and third difficulties can be resolved by using special computer hardware for vector processing of Monte Carlo codes. Here, the fourth and fifth difficulties are discussed in detail using the results for a vectorized version of the MORSE code. As for the fourth difficulty, it is shown that the cache miss-hit ratio affects execution times of the vectorized Monte Carlo codes and the ratio strongly depends on the number of the particles simultaneously tracked. As for the fifth difficulty, it is shown that remarkable speedup ratios are obtained by removing operations that are not essential to the specific problem being solved. These experiences have shown that if a production-level Monte Carlo code system had a capability to selectively construct source coding that complements the input data, then the resulting code could achieve much higher performance.

Difficulties in Vector-Parallel Processing of Monte Carlo Codes