Invited Speakers and Presentation Abstracts
John Stone
Paul Woodward
Nathan Bell
Michael McCool
Michael Perrone
John Stone University of Illinois at Urbana-Champaign
Accelerating Biomolecular Modeling Applications with GPU Computing
State-of-the-art graphics processing units (GPUs) contain
hundreds of processing units and are able to perform
trillions of floating point arithmetic operations per second.
Continuing advances in GPU software development tools and
educational efforts have made GPU computing more accessible
to computational scientists. Mr. Stone will present an overview of
recent successes in the use of GPUs to accelerate applications
in biomolecular modeling where GPU computing techniques have provided
speedups ranging from 10 to 100 times faster than commodity CPU cores.
He will describe key challenges and algorithmic strategies involved
in achieving high computational performance on GPUs and will
include examples of how this computational performance enables
better science.
Bio
John Stone is a Senior Research Programmer in the Theoretical and
Computational Biophysics Group at the Beckman Institute for Advanced Science
and Technology at the University of Illinois, and the lead developer of VMD,
a high performance molecular visualization tool used by biophysicists
and structural biologists all over the world. Mr. Stone's research
interests include molecular visualization, GPU accelerated computing,
parallel processing, ray tracing, haptics, and virtual environments.
Paul R. WoodwardUniversity of Minnesota
Computational Fluid Dynamics on the Los Alamos Roadrunner Machine
My research group has been working with collaborators at the Los Alamos
National Laboratory to exploit the new Cell-based Roadrunner hardware
for computational fluid dynamics. We have implemented our PPM gas
dynamics code, along with a new PPB multifluid advection scheme, on Cell
and then on Roadrunner over the last 18 months. We began by adopting
strategies that were essential to running on Cell at all, and these have
since proved very effective on all other equipment we have yet tried. I
will describe the key features of our code's restructuring in this
project and the lessons we have learned from the process. Among these
are the importance of careful design to allow the extensive use of
32-bit floating point arithmetic, restructuring of the fundamental data
in the machine memory so that one thinks about off-chip memory much as
we are used to thinking of data on disk, extensive pipelining of the
computation to avoid unnecessary round trips of data between the
processor and main memory, and extensive overlapping of not only
messaging but also preparation of I/O data and, of course, disk I/O.
These code changes are driven by the unprecedented computational
potential of the Cell processor, but they deliver their full benefits on
all other modern multicore CPUs. As a consequence, code design is
necessarily more complex. I will briefly describe source-to-source code
translators that we are developing to deal not only with the problem of
portable code but also with this new burden of code complexity. Results
of running this new PPM code on Roadrunner and other cluster equipment
will be presented that simulates the development of turbulent multifluid
mixing in stars and in the general context of the classic
Rayleigh-Taylor instability.
Bio
Paul Woodward received his Ph.D. in physics from the University of
California, Berkeley in 1973. He has focused his research on
simulations of compressible flows in astrophysics, studying problems in
star formation, supersonic jet propagation, convection in stars, and
astrophysical turbulence. He is a Fellow of the Minnesota
Supercomputing Institute and directs the Laboratory for Computational
Science & Engineering (LCSE) within the University of Minnesota's
Digital Technology Center. The LCSE concentrates on high performance
parallel computation and the data analysis and visualization that this
requires. Woodward received the IEEE's Sidney Fernbach award in
large-scale computing in 1995 and, with 12 collaborators at Livermore,
Minnesota, and IBM, received the Gordon Bell prize in the performance
category in 1999. Most recently he has been working in collaboration
with colleagues at Los Alamos and IBM to exploit the power of the
multicore Cell processor for computational fluid dynamics.
Nathan Bell NVIDIA Research
Efficient Sparse Matrix-Vector Multiplication on CUDA
The massive parallelism of graphics processing units (GPUs) offers
tremendous
performance in many high-performance computing applications. While
dense linear
algebra readily maps to such platforms, harnessing this potential for
sparse
matrix computations presents additional challenges. Given its role in
iterative
methods for solving sparse linear systems and eigenvalue problems,
sparse
matrix-vector multiplication (SpMV) is of singular importance in
sparse linear
algebra.
In this paper we discuss data structures and algorithms for SpMV that
are
efficiently implemented on the CUDA platform for the fine-grained
parallel
architecture of the GPU. Given the memory-bound nature of SpMV, we
emphasize
memory bandwidth efficiency and compact storage formats. We consider a
broad spectrum of sparse matrices, from those that are well-structured
and
regular to highly irregular matrices with large imbalances in the
distribution
of nonzeros per matrix row. We develop methods to exploit several
common forms
of matrix structure while offering alternatives which accommodate
greater
irregularity.
On structured, grid-based matrices we achieve performance of 36 GFLOP/
s in
single precision and 16 GFLOP/s in double precision on a GeForce GTX
280 GPU.
For unstructured finite-element matrices, we observe performance in
excess of 15
GFLOP/s and 10 GFLOP/s in single and double precision respectively.
These results compare favorably to prior state-of-the-art studies of
SpMV
methods on conventional multicore processors. Our double precision SpMV
performance is generally two and a half times that of a Cell BE with 8
SPEs
and more than ten times greater than that of a quad-core Intel
Clovertown system.
Bio
Nathan Bell is a research scientist at NVIDIA. His current research
interests include sparse linear algebra and programming models for GPU
computing. Nathan is a recent graduate of the University of Illinois
and a contributor to PyAMG, a Python package for algebraic multigrid.
Michael McCool RapidMind / University of Waterloo
Applications of Many-Core Computing in Medical Imaging
Many-core processors, including both GPUs and emerging multi-core CPUs, promise to radically increase the computing power available in a single desktop or
server. One of the most interesting applications of this increased computational performance is in medical imaging, including reconstruction, enhancement,
registration, segmentation and analysis. First of all, many medical imaging modalities depend instrinsically on computation to reconstruct volumetric images
from projections or other indirect observations. Secondly, making sense of these volumes involves various forms of image enhancement and rendering, including
both image processing operations that generate new volumes or images and analysis algorithms that characterize a volume or identify structures. Improvements
in the performance of these algorithms can have a direct impact on the quality of health care. In this presentation, I will survey a number of algorithms in
the medical imaging domain and discuss the application of many-core computing to them.
Bio
Michael McCool is an Associate Professor in the School of Computer Science at the University of Waterloo and co-founder of RapidMind Inc. RapidMind develops a
programming platform that targets high-performance many-core processors including multi-core CPUs, the Cell BE, and GPUs. This platform provides a
single-source solution for high-level but efficient parallel programming using existing C++ compilers.
Prof. McCool's current research efforts are targeted at enabling high-performance parallel applications by the development of advanced programming technologies. Research interests include interval analysis, Monte Carlo and quasi-Monte Carlo numerical methods, optimization, simulation, sampling, cellular automata,
real-time computer graphics, vision, image processing, hardware design, and programming languages and development platforms.
Michael Perrone IBM TJ Watson Research Center
Finding Oil with Cells: Seismic Imaging Using a Cluster of Cell Processors
Modern deep sea oil exploration is a very expensive proposition. A
single well can cost about 150M$ and the probability of drilling a
"dry" hole is about 60-70 percent! To help reduce cost, oil
companies have turned to increasingly complex computational imaging
techniques to improve the quality of imaging. And in order to
reduce the "time to oil", images must be generated as quickly as
possible; so these algorithms are run on high-performance computing
clusters. This presentation will discuss one such imaging
application implemented on a 296-node, heterogeneous cluster
composed primarily of Cell processors.
Bio
Michael Perrone is an IBM Master Inventor, Research Staff Member and
the manager of the Multicore Computing Department at IBM's T.J.
Watson Research Center. His department has the mission of
optimizing workload performance for multicore processors with an eye
towards understanding the key algorithmic and HW properties
required. Current projects include HPC workloads, carbon
sequestration, financial data stream processing, seismic imaging,
network intrusion detection, digital content creation, rich media
mining, image analysis, speech recognition and bioinformatics. His
research includes algorithmic optimization for the Cell processor,
parallel computing and statistical machine learning. He received
his PhD in Physics from Brown University.
|