This page is mostly outdated, as newer and more powerful Pascal GPUs are now available across Compute Canada (in Graham, Cedar etc.).
Mosaic, 20 nodes, 1 K20 GPU per node
Copper, 8 nodes, 4 K80 cards (8 GPU devices) per node
Cluster Guillimin (http://www.calculquebec.ca/en/resources/compute-servers/guillimin) has 100 K20s (and also 100 Phi cards).
The following K20c timings are expressed in monk (M2075 GPU) timing units.
|Code Name||Precision||K20c, 1 thread||K20c, 8 threads||Hyper-Q speedup||K20c/serial, 1 thread||K20c/serial, 8 threads|
|Random Lens Design, low res||DP||0.60||1.79||3.00||58||21.5|
|Random Lens Design, high res||DP||0.89||1.48||1.66||272||57|
- SP/DP/MP: single/double/mixed precision
- Per thread timing: wall clock time divided by the number of threads
- Speedup due to Hyper-Q: the ratio of K20c/8 threads timing to K20c/1 thread timing.
- Speedup of 1 CPU thread+K20c vs. serial code ran on 1 CPU thread; on arc09. Be aware that arc09 cpu core is 1.75x slower than orca's; so for comparing K20c to orca, one has to divide this number by 1.75.
- Speedup of 8 CPU threads+K20c (GPU farm) vs. serial code ran on 8 CPU threads (serial farm); on arc09. Be aware that arc09 cpu core is 1.75x slower than orca's; so for comparing K20c to orca, one has to divide this number by 1.75.
- Random Lens Design. Written by Sergey Mashchenko. Discovery of new complex lens designs by means of global optimization (search of the global minimum of the merit function in 40-100 dimensional space). Monte Carlo type search (good for serial/GPU farming). Purely double precision (needed for the smoothness of the merit function.) Both serial and CUDA (capability 2.x) versions. Merit function is computed from the results of ray tracing through the system; each ray is handled by a separate CUDA thread. Low resolution runs (~10,000 rays/threads) are used for the initial search; high resolution runs (~100,000 rays/threads) are used to fine-tune the best candidates. The CUDA code has more than 10 kernels and a few device functions. The device memory consumption is ~200MB for low res, ~400 MB for high res. At low resolution, the occupancy number is fairly low (~0.20).