As neural networks get deeper and training data get bigger, deep learning needs more computing power to accommodate the computationally intensive training process. This lecture introduces how to leverage HPC to accelerate deep learning through parallelism. We will discuss topics like data I/O, utilizing multiple GPUs on a single machine or across a cluster, etc. We will also give benchmarks of a typical deep learning problem (mnist) with various hardware configurations, which include single/multiple CPUs, single/multiple GPUs. We use Tensorflow for the above implementations and benchmarking tests. Audience will learn practical skills of making good use of available hardware to maximize the training speed.