A strategy for implementing source iteration on massively parallel computers for use in solving multigroup discrete ordinates neutron transport equations on three-dimensional Cartesian grids is proposed and analyzed. Based on an analysis of the memory requirement and floating-point complexity of the formal matrix-vector multiplication effected by a single source iteration, a data decomposition and communication strategy is presented that is designed to achieve good scalability with respect to all phase-space variables, i.e., neutron position, energy, and direction. A performance model is developed to analyze the scalability properties of the algorithm and to provide computational and heuristic strategies for determining a data decomposition that minimizes wall clock execution time. Numerical results are presented to demonstrate the performance of a specific implementation of this approach on a 1024-node nCUBE/2.