Supercomputers

Let us put aside the command line for a bit though to first talk about supercomputers in Canada.

Usefulness

The first thing I would like to address is what a supercomputer is. There is no hardware that we can buy that you cannot also buy. What we can do, though, is buy a lot more hardware.

A super computer is, therefore, not some super fast computer that hardware companies only sell to a select few organizations. It is, instead, the standard computers and computer hardware that hardware companies sell to everyone bought and assembled on a massive scale.

SHARCNET Graham Cluster

To make an analogy, a supercomputer is team of scientists solving a problem faster through collaboration. It is not a superhero scientist solving a problem fast through pure personal awesomeness.

Thinking about the team analogy will answer many basic questions about using a supercomputer. For example, most problems fall between the following two cases

  • I have many smaller independent tasks that need to be done.

  • I have one large highly dependent task that needs to be done.

Exercises

What do you think about these two tasks? Using the yes and no buttons on the participant window, answer the following question

  1. If you have 20 people available to help you with many smaller independent tasks that need to be done (e.g., weighing 100 samples), can you expect to be finished about 20x faster?

  2. If you have 20 people available to help with a single large highly dependent task (e.g., writing your thesis), can you expect to be finished about 20x faster?

Canada

The supercomputer story in Canada is actually pretty ideal from a researcher perspective. While there are several different organizations providing research supercomputers across Canada, we are all part of the Compute Canada organization. We have standardized our accounts, our software, and our support.

This means a single account gets you access to all the systems, and you can move seamlessly from system to system with an minimal learning curve as they are all configured with minimal differences.

Documentation for all our systems is found on the Compute Canada wiki

https://docs.computecanada.ca

The main page includes, amongst other things, information on

  • applying for an account (on the left hand side)

  • the current systems (in the systems and services table)

  • installed software (in the systems and services table)

  • frequently asked questions (in the systems and services table)

  • how to connect, transfer files, and run jobs (how-to guides table)

Accounts

As explained on our documentation page, our accounts are organized into principle investigators (PIs) and PI sponsored users. The former must be faculty at eligible institutes (Universities, Colleges, etc.) and the later must be associated with a registered PI.

For most everyone here, this will mean your supervisor must first have an account. The you can go to the Compute Canada Database (CCDB) site

https://ccdb.computecanada.ca

and apply for an account

  • click register button below the sign in boxes

  • agree to the Compute Canada policies

  • indicate you don’t have a prior account (unless you do of course!)

  • fill in the person information

  • pick an appropriate sponsored roll (e.g., Master’s student)

  • enter your supervisor’s Compute Canada Roll Identifier (CCRI)

  • pick a username and password

The CCRI identifies your sponsor’s account. It is a combination of letters and number like tuv-232-02 that they can get by logging into the CCDB site and picking My Account -> Account Details and looking at the first line in the Active Roles box.

After submission, your sponsor will be sent an email asking them to confirm that they are sponsoring you. Once they have done this, your account will be enabled and you will be able to access all the Compute Canada systems with the exception of Niagara.

To also access Niagara you will need to login to the CCDB site and go My Account -> Request access to legacy clusters and click the Request access to Niagara and Mist button.

Systems

As covered on the Compute Canada document wiki, the four major supercomputers currently operating in Canada are

Organization

Super Computer

Nodes

Cores

Storage

Calcul Québec

Béluga

872

34,880

11.6PB

SHARCNET

Graham

1,261

41,548

15.3PB

SciNet

Niagara

2,016

80,640

9.4PB

WestGrid

Cedar

2,502

101,424

13.8PB

  • node - means a single computer

  • core - means a single CPU (generally 32 or more per node)

SHARCNET Graham Compute Node

The individual nodes (computers) are connect by a very fast, low-latency, low-congestion network (typically full data rate Infiniband) in order to ensure there are no barriers for programs that need multiple nodes to collaborate when solving problems.

Most of the individual nodes (computers) in a supercomputer are the same. On graham, for example, most nodes have

  • 32 cores

  • 125GB of memory

but some have

  • 44 or 64 cores (for large threaded jobs),

  • 250GB, 502GB or 3,022GB of memory (for large memory jobs), or

  • Pascal, Volta, or Turing GPUs (for GPGPU jobs)

to also enable computations that require these more specialized resources.

Storage

Each supercomputer has its own storage systems. All the nodes (computers) in a supercomputer share the same storage. Within a supercomputer, the folder you are under determines what storage system you are working with. There are three main storage systems

Storage Folder

Group

Expiration

HPC

Size

Files

/home/

No

No

No

50GB

500K

/scratch/

No

60 days

Yes

20TB

1,000K

/project/

Yes

No

Yes

1TB

500K

where size and file number limits are for graham (the other supercomputers have similar, but slightly different, limits). All the file systems but /scratch are backed up each day.

Our data policy is that the data belongs to the sponsor. This means that you loose access to files when your sponsor stops sponsoring you (e.g., you graduate and move on, you have a falling out, etc.), so make sure to keep a person copy of anything you really care about too.

Running programs

The Canadian supercomputers, as with most around the world, run Linux. Linux is an open source operating started by a Finish Computer Science student named Linus in 1991. It is based on the Unix standard (Linux is a composition of the words Linus and Unix) and is now developed collaboratively by many people from all over the Internet.

Linux is not Windows, and Windows programs will not, for the most part, run on Linux. To run a program on the supercomputers, it needs to be a Linux program, or you need to have the source code so you can build the program yourself for Linux.

The good news is that we have already done this for almost the common programs that our users use. The Compute Canada documentation wiki contains an extensive list of the programs we have already installed along with any specific information required to run a program, such as licensing configuration for commercial software.

The other thing that is important to know about the supercomputers is they are not a free for all. You do not pick a node (computer) at random and go run your program on it. Rather you tell the system what you want to run and what resources it needs (how much time, how much memory, etc.). It will then run your program in such away to ensure it has access to all the resources it needs, that it does not interfere with other running programs, and that no one hogs the system.

We will cover how to schedule and mange your jobs later in this course.

Support

Support is provided through a single email address support@computecanada.ca Emailing this address will open a ticket on the Compute Canada support website and our staff will direct your question to the support individual most suitable for helping you across Canada. So long as it isn’t later in the afternoon, response time is generally the same day.

Many institutions, such as Western, also have local staff on site, such as myself. You can also contact us directly, although we generally prefer you to use the support email as this permits us to better mange and direct queries (e.g., taking into account the current workload of our staff in order to get you the fastest answer back).