SIGN-IN

[CANCELLED] Webinar "Submitting Checkpoint/Restart Jobs on SHARCNET with BLCR" - 12:00pm

Date Wednesday November 23 2016
Time 12:00 - 13:00
Location Online
Contact syam@sharcnet.ca
URL http://vidyo.computecanada.ca/flex.html?roomdirect.html&key=Pr1GiEI51kFi
You are NOT registered for this event. Register.

Topic: “Submitting Checkpoint/Restart Jobs on SHARCNET with BLCR

Speaker: Doug Roberts, SHARCNET


In this webinar we demonstrate how to checkpoint and restart serial or threaded jobs submitted to the queue on SHARCNET clusters without performing any modifications to the application source code. To do this we focus on a software package and module installed on SHARCNET known as BLCR (Berkeley Lab Checkpoint/Restart). This tool performs checkpoint restarts inside the Linux kernel and while this makes it less portable than solutions which use user-level libraries, it has full access to all kernel resources, and thus can restore resources (like process IDs) that user-level libraries cannot and groups of processes (such as shell scripts and their sub-processes) along with their pipes. BLCR provides a straightforward and powerful approach to add fault tolerance into your HPC computing daily workflow, so consider putting this talk into your November schedule.


Please note that the webinar’s time is for the Eastern Time (EST/EDT) zone.

Need help attending a webinar? See the SHARCNET Help Wiki.