High performance computing

High performance computing (HPC) is a powerful tool in today’s research and allows for large-scale computations of complex systems, for big data analysis, or for data visualization. To achieve this, HPC systems have thousands of processors, multiple GPUs, hundreds of gigabytes of memory, and terabytes of storage available.

Resources are available throughout Canada in the form of large compute clusters provided by the consortia under the umbrella of Compute Canada but also through your own equipment hosted in the university data centres.

Getting Access

Compute Canada is the organization overseeing all of the compute clusters available in Canada. To begin using a cluster, you must first register and open an account.

This image shows Compute Canada login screen

Access is available for free for all researchers at Canadian institutions. If you are a PI, you can request an account directly. If you are a Post-Doc or students, you can get a sponsored account through your supervisor.

Job submission

All compute clusters use a queuing system allowing each program to run at full speed without running out of resources.

To run the program, you must submit it to the queue and the cluster scheduler will determine where and when your job will be completed. The wait time in the queue depends on the current usage of the cluster as well as on the requested amount of CPU cores, memory and requested run time. Jobs that require many resources carry a longer wait time because the scheduler has to wait until there is space for your job. It is therefore very important to know how many resources your program requires in order to minimize waiting times.

Minimize queue wait time

The clusters use different schedulers but they operate on the same principles – the larger the computational requirements, the longer wait time in the queue.

You can determine the amount of resources your program needs in several ways.

If you have access to the source code, you can estimate the amount of memory a particular run would require. Create a routine that goes through the initial setup without allocating the memory, so you can keep track of the allocated memory. While this does require some work initially, it will allow you to quickly calculate the required amount of memory and will save you time in the end. Some commercial packages might also give you estimates based on the input configuration.

If this is not an option, you can run trails of your program on your computer with small input parameters.

For example, if you are running a simulation requiring calculations with N=500 particles, complete runs for N=10, 20, 30, 40. Check how the memory usage increases with each increment. In Linux, you can check the memory usage with pmap.

Find the PID of your program.

$ ps -u $USER
  PID TTY          TIME CMD​​​​​​​
    4 tty1     00:00:00 bash​​​​​​​
  141 tty1     00:00:00 ps​​​​​​​
  166 tty1     00:01:22 simulations

Then check memory usage with pmap

$ pmap 166
166:   simulations​​​​​​​
00007ff3dc000000    132K rw---   [ anon ]​​​​​​​
00007ff3dc021000  65404K -----   [ anon ]
...
...
...​​​​​​​
00007ff456815000      4K rw--- simulations​​​​​​​
00007ffff0590000  16388K rw---   [ anon ]​​​​​​​
00007ffff699e000   8192K rw---   [ anon ]​​​​​​​
00007ffff784b000      4K r-x--   [ anon ]
 total          1982516K

From the total in the last line, we can see that here the program used about 2 GB. Scaling tends to be either polynomial or logarithmic. By plotting the results in a graph and possibly fitting with a polynomial or logarithm, you can estimate the memory usage for N=500.

After submitting a job to a cluster, you will see in the automatically generated log file the memory usage of your program so you can adjust the next submissions accordingly. Our recommendation is to choose 10% more so your job will be completed when memory usage fluctuates a bit.

Estimating run-time is more difficult since this depends on the CPU speed and on the program scaling when running on multiple CPU cores. You will need to make an estimated guess and, when submitting the job, to allow for a liberal margin of error. Use the cluster reports to determine the real run time.

Checkpointing

If your program requires a long run time, it will be in the queue for a long time before it is started. Our recommendation is that such programs implement checkpointing.

Checkpointing is a technique that makes a snapshot of the entire program memory and saves it to disk periodically.

For example, a one-week program can be split into 7 one-day runs. This will reduce the wait time and will make your program more resilient to crashes or outages as it will continue from the last checkpoint.

This is a working example of a simple molecular dynamics simulation with checkpointing.

The simulation creates a new checkpoint every 40 steps by saving the iteration number and the positions of all the particles at that iteration to a binary file and then quits.

This image shows a new checkpoint created every 40 steps by saving the iteration number and the positions of all the particles at that iteration to a binary file and then quits

Running the program again will continue where it left of, thanks to the data written to the checkpoint.

This image shows that the program will run from where it left of thanks to the checkpoint data

Jobs can be submitted in such a way that they can wait on other jobs to finish, so you do not have to manually resubmit the job after each checkpoint has completed.

Related Links

For Research Support

For help using clusters and running programs, please contact us using the Service Desk request form.

Back to top