Discrete ordinates transport packages from the Los Alamos National Laboratory are required to perform large computationally intensive time-dependent calculations on massively parallel architectures, where even a single such calculation may need many months to complete. While Koch-Baker-Alcouffe (KBA) methods scale well to very large numbers of compute nodes, we are limited by practical constraints on the number of such nodes we can actually apply to any given calculation. Instead, this paper describes a modified KBA algorithm that allows realization of the reductions in solution time offered by both the current and future architectural changes within a compute node.