In this work we describe a new method for parallelizing the source iterations in a Monte Carlo criticality calculation. Instead of having one global fission bank that needs to be synchronized, as is traditionally done, our method has each processor keep track of a local fission bank while still preserving reproducibility. In doing so, it is required to send only a limited set of fission bank sites between processors, thereby drastically reducing the total amount of data sent through the network. The algorithm was implemented in a simple Monte Carlo code and shown to scale up to hundreds of processors and furthermore outperforms traditional algorithms by at least two orders of magnitude in wall-clock time.