This repo presents performance comparisons between a serial implementation, a MPI based and a Spark based implementation of a document clustering algorithm