Different models
Three different Dirichlet Process Gaussian mixture models were developed, each of which with a different covariance structure. The models after 500 iterations can be seen below. The models vary by the amount of constraint that is put on the type of Gaussian distributions that can be used on the model. Equal volume spherical means that all distributions must be spherical and of equal size, unequal volume spherical means that all distributions must be spherical but can be of different size, and elliptical means that the distributions can be non-spherical and unique.
Testing parameter
A large portion of the project was spent testing the different parameters in the models to see how they effected the performance with varying data input. Two major parameters were the those of the Beta distribution prior placed over the variances along each dimension. How this effected the outcome of each model on synthesised data can be seen below.
Final results
Finally, the models were run on some real breast cancer data. The elliptical model performed particularly badly but the posterior over the number of clusters for the other two models can be seen below.
From these plots it can be seen that the most confident conclusion come from the EVSpherical model predicting that there are 5 cluster components (and hence 5 breast cancer subtypes) in the model. This result also comes with associated cluster labels for the clusters each data points was predicted to be in.