Invited Speakers and Presentation Abstracts

John Stone
Paul Woodward
Nathan Bell
Michael McCool
Michael Perrone

John Stone
University of Illinois at Urbana-Champaign

    Accelerating Biomolecular Modeling Applications with GPU Computing

    State-of-the-art graphics processing units (GPUs) contain hundreds of processing units and are able to perform trillions of floating point arithmetic operations per second. Continuing advances in GPU software development tools and educational efforts have made GPU computing more accessible to computational scientists. Mr. Stone will present an overview of recent successes in the use of GPUs to accelerate applications in biomolecular modeling where GPU computing techniques have provided speedups ranging from 10 to 100 times faster than commodity CPU cores. He will describe key challenges and algorithmic strategies involved in achieving high computational performance on GPUs and will include examples of how this computational performance enables better science.


    John Stone is a Senior Research Programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology at the University of Illinois, and the lead developer of VMD, a high performance molecular visualization tool used by biophysicists and structural biologists all over the world. Mr. Stone's research interests include molecular visualization, GPU accelerated computing, parallel processing, ray tracing, haptics, and virtual environments.

Paul R. Woodward
University of Minnesota

    Computational Fluid Dynamics on the Los Alamos Roadrunner Machine

    My research group has been working with collaborators at the Los Alamos National Laboratory to exploit the new Cell-based Roadrunner hardware for computational fluid dynamics. We have implemented our PPM gas dynamics code, along with a new PPB multifluid advection scheme, on Cell and then on Roadrunner over the last 18 months. We began by adopting strategies that were essential to running on Cell at all, and these have since proved very effective on all other equipment we have yet tried. I will describe the key features of our code's restructuring in this project and the lessons we have learned from the process. Among these are the importance of careful design to allow the extensive use of 32-bit floating point arithmetic, restructuring of the fundamental data in the machine memory so that one thinks about off-chip memory much as we are used to thinking of data on disk, extensive pipelining of the computation to avoid unnecessary round trips of data between the processor and main memory, and extensive overlapping of not only messaging but also preparation of I/O data and, of course, disk I/O. These code changes are driven by the unprecedented computational potential of the Cell processor, but they deliver their full benefits on all other modern multicore CPUs. As a consequence, code design is necessarily more complex. I will briefly describe source-to-source code translators that we are developing to deal not only with the problem of portable code but also with this new burden of code complexity. Results of running this new PPM code on Roadrunner and other cluster equipment will be presented that simulates the development of turbulent multifluid mixing in stars and in the general context of the classic Rayleigh-Taylor instability.


    Paul Woodward received his Ph.D. in physics from the University of California, Berkeley in 1973. He has focused his research on simulations of compressible flows in astrophysics, studying problems in star formation, supersonic jet propagation, convection in stars, and astrophysical turbulence. He is a Fellow of the Minnesota Supercomputing Institute and directs the Laboratory for Computational Science & Engineering (LCSE) within the University of Minnesota's Digital Technology Center. The LCSE concentrates on high performance parallel computation and the data analysis and visualization that this requires. Woodward received the IEEE's Sidney Fernbach award in large-scale computing in 1995 and, with 12 collaborators at Livermore, Minnesota, and IBM, received the Gordon Bell prize in the performance category in 1999. Most recently he has been working in collaboration with colleagues at Los Alamos and IBM to exploit the power of the multicore Cell processor for computational fluid dynamics.

Nathan Bell
NVIDIA Research

    Efficient Sparse Matrix-Vector Multiplication on CUDA

    The massive parallelism of graphics processing units (GPUs) offers tremendous performance in many high-performance computing applications. While dense linear algebra readily maps to such platforms, harnessing this potential for sparse matrix computations presents additional challenges. Given its role in iterative methods for solving sparse linear systems and eigenvalue problems, sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra.

    In this paper we discuss data structures and algorithms for SpMV that are efficiently implemented on the CUDA platform for the fine-grained parallel architecture of the GPU. Given the memory-bound nature of SpMV, we emphasize memory bandwidth efficiency and compact storage formats. We consider a broad spectrum of sparse matrices, from those that are well-structured and regular to highly irregular matrices with large imbalances in the distribution of nonzeros per matrix row. We develop methods to exploit several common forms of matrix structure while offering alternatives which accommodate greater irregularity.

    On structured, grid-based matrices we achieve performance of 36 GFLOP/ s in single precision and 16 GFLOP/s in double precision on a GeForce GTX 280 GPU. For unstructured finite-element matrices, we observe performance in excess of 15 GFLOP/s and 10 GFLOP/s in single and double precision respectively. These results compare favorably to prior state-of-the-art studies of SpMV methods on conventional multicore processors. Our double precision SpMV performance is generally two and a half times that of a Cell BE with 8 SPEs and more than ten times greater than that of a quad-core Intel Clovertown system.


    Nathan Bell is a research scientist at NVIDIA. His current research interests include sparse linear algebra and programming models for GPU computing. Nathan is a recent graduate of the University of Illinois and a contributor to PyAMG, a Python package for algebraic multigrid.

Michael McCool
RapidMind / University of Waterloo

    Applications of Many-Core Computing in Medical Imaging

    Many-core processors, including both GPUs and emerging multi-core CPUs, promise to radically increase the computing power available in a single desktop or server. One of the most interesting applications of this increased computational performance is in medical imaging, including reconstruction, enhancement, registration, segmentation and analysis. First of all, many medical imaging modalities depend instrinsically on computation to reconstruct volumetric images from projections or other indirect observations. Secondly, making sense of these volumes involves various forms of image enhancement and rendering, including both image processing operations that generate new volumes or images and analysis algorithms that characterize a volume or identify structures. Improvements in the performance of these algorithms can have a direct impact on the quality of health care. In this presentation, I will survey a number of algorithms in the medical imaging domain and discuss the application of many-core computing to them.


    Michael McCool is an Associate Professor in the School of Computer Science at the University of Waterloo and co-founder of RapidMind Inc. RapidMind develops a programming platform that targets high-performance many-core processors including multi-core CPUs, the Cell BE, and GPUs. This platform provides a single-source solution for high-level but efficient parallel programming using existing C++ compilers.

    Prof. McCool's current research efforts are targeted at enabling high-performance parallel applications by the development of advanced programming technologies. Research interests include interval analysis, Monte Carlo and quasi-Monte Carlo numerical methods, optimization, simulation, sampling, cellular automata, real-time computer graphics, vision, image processing, hardware design, and programming languages and development platforms.

Michael Perrone
IBM TJ Watson Research Center

    Finding Oil with Cells: Seismic Imaging Using a Cluster of Cell Processors

    Modern deep sea oil exploration is a very expensive proposition. A single well can cost about 150M$ and the probability of drilling a "dry" hole is about 60-70 percent! To help reduce cost, oil companies have turned to increasingly complex computational imaging techniques to improve the quality of imaging. And in order to reduce the "time to oil", images must be generated as quickly as possible; so these algorithms are run on high-performance computing clusters. This presentation will discuss one such imaging application implemented on a 296-node, heterogeneous cluster composed primarily of Cell processors.


    Michael Perrone is an IBM Master Inventor, Research Staff Member and the manager of the Multicore Computing Department at IBM's T.J. Watson Research Center. His department has the mission of optimizing workload performance for multicore processors with an eye towards understanding the key algorithmic and HW properties required. Current projects include HPC workloads, carbon sequestration, financial data stream processing, seismic imaging, network intrusion detection, digital content creation, rich media mining, image analysis, speech recognition and bioinformatics. His research includes algorithmic optimization for the Cell processor, parallel computing and statistical machine learning. He received his PhD in Physics from Brown University.

© 2009 Shared Hierarchical Academic Research Computing Network (www.sharcnet.ca).