Access Type

Open Access Dissertation

Date of Award

January 2013

Degree Type


Degree Name



Computer Science

First Advisor

Loren Schwiebert


The objective of this work is to design and implement a self-adaptive parallel GPU optimized Monte Carlo algorithm for the simulation of adsorption in porous materials. We focus on Nvidia's GPUs and CUDA's Fermi architecture specifically. The resulting package supports the different ensemble methods for the Monte Carlo simulation, which will allow for the simulation of multi-component adsorption in porous solids. Such an algorithm will have broad applications to the development of novel porous materials for the sequestration of CO2 and the filtration of toxic industrial chemicals.

The primary objective of this work is the release of a massively parallel open source Monte Carlo simulation engine implemented using GPUs, called GOMC. The code will utilize the canonical ensemble, and the Gibbs ensemble method, which will allow for the simulation of multiple phenomena, including liquid-vapor phase coexistence, and single and multi-component adsorption in porous materials. In addition, the grand canonical ensemble and the configurational-bias algorithms have been implemented so that polymeric materials and small proteins may be simulated.

This simulation engine is the only open source GPU optimized Monte Carlo code available for the generalized simulation of adsorption and phase equilibria on a very large scale. As a result of conducting many optimization techniques and allowing the system to adjust for the change of simulation state, the original MC algorithm has been rewritten based on an existing serial algorithm to suit the massive parallel devices resulting in reductions in computational time. This large time reduction allow for the simulation of significantly larger systems for longer timescales than is currently possible with existing implementations.

Results of the extensive research and applying device specific optimizations resulted in significant speedup. First, for the NVT method, a fully optimized serial algorithm has been implemented and the performance results has been compared to Towhee. A speedup of about 438 times has been achieved for a relatively small size problem of 4096 particles. In addition, two algorithms to run on the GPU with and without cell list structure have been implemented. The total speedup of the parallel code with cell list over the serial code was more than 160x faster. Moreover, for the grand canonical ensemble, a serial and two parallel algorithms have been developed. The simulation box in this method can be resized, which added a change to the algorithm that needed to adapt with the box size and adjust itself. The performance of running the CUDA code with cell list versus the serial code that doesn't have a cell list structure is a factor of 130 times faster.

More MC ensembles have been transferred to the GPU. The Gibbs ensemble method has two simulation boxes and three types of moves. This method has been studied carefully and the GPU algorithm has been implemented to port the computation intensive functions to the GPU. The performance of the GPU code was about 50x faster than the serial code. Finally, an extension of the Gibbs method has been implemented on the GPU. The particle transfer from one box to the other is the affected move type by this extension. CUDA streams are used to parallelize K trials for this method. A factor of three times speedup for the particle transfer move has been achieved for the best case. However, due to the low execution rate of the particle transfer move, just 10% of the total moves, the speedup has minimal effect on overall execution time of the simulation. Furthermore, a different run with all move types on Kepler K20c card has been executed, and a factor of 2 times speedup has been reported over the CUDA code on the GeForce GTX 480 card.

The main contribution of this work to society is when the above implementations become open source to the public through Also, other researchers can take advantage of the lessons learned with advanced optimizations and self-adapting mechanisms specific to the GPU. On the application level, the current code can be used by the chemical engineering community to explore accurate and affordable simulations that were not possible before.