Supercomputers¶
Let us put aside the command line for a bit though to first talk about supercomputers in Canada.
Usefulness¶
The first thing I would like to address is what a supercomputer is. There is no hardware that we can buy that you cannot also buy. What we can do, though, is buy a lot more hardware.
A super computer is, therefore, not some super fast computer that hardware companies only sell to a select few organizations. It is, instead, the standard computers and computer hardware that hardware companies sell to everyone bought and assembled on a massive scale.
To make an analogy, a supercomputer is team of scientists solving a problem faster through collaboration. It is not a superhero scientist solving a problem fast through pure personal awesomeness.
Thinking about the team analogy will answer many basic questions about using a supercomputer. For example, most problems fall between the following two cases
I have many smaller independent tasks that need to be done.
I have one large highly dependent task that needs to be done.
Exercises¶
What do you think about these two tasks? Using the yes and no buttons on the participant window, answer the following question
If you have 20 people available to help you with many smaller independent tasks that need to be done (e.g., weighing 100 samples), can you expect to be finished about 20x faster?
If you have 20 people available to help with a single large highly dependent task (e.g., writing your thesis), can you expect to be finished about 20x faster?
Canada¶
The supercomputer story in Canada is actually pretty ideal from a researcher perspective. While there are several different organizations providing research supercomputers across Canada, we are all part of the Compute Canada organization. We have standardized our accounts, our software, and our support.
This means a single account gets you access to all the systems, and you can move seamlessly from system to system with an minimal learning curve as they are all configured with minimal differences.
Documentation for all our systems is found on the Compute Canada wiki
The main page includes, amongst other things, information on
applying for an account (on the left hand side)
the current systems (in the systems and services table)
installed software (in the systems and services table)
frequently asked questions (in the systems and services table)
how to connect, transfer files, and run jobs (how-to guides table)
Accounts¶
As explained on our documentation page, our accounts are organized into principle investigators (PIs) and PI sponsored users. The former must be faculty at eligible institutes (Universities, Colleges, etc.) and the later must be associated with a registered PI.
For most everyone here, this will mean your supervisor must first have an account. The you can go to the Compute Canada Database (CCDB) site
and apply for an account
click register button below the sign in boxes
agree to the Compute Canada policies
indicate you don’t have a prior account (unless you do of course!)
fill in the person information
pick an appropriate sponsored roll (e.g., Master’s student)
enter your supervisor’s Compute Canada Roll Identifier (CCRI)
pick a username and password
The CCRI identifies your sponsor’s account. It is a combination of letters and number like tuv-232-02 that they can get by logging into the CCDB site and picking My Account -> Account Details and looking at the first line in the Active Roles box.
After submission, your sponsor will be sent an email asking them to confirm that they are sponsoring you. Once they have done this, your account will be enabled and you will be able to access all the Compute Canada systems with the exception of Niagara.
To also access Niagara you will need to login to the CCDB site and go My Account -> Request access to legacy clusters and click the Request access to Niagara and Mist button.
Systems¶
As covered on the Compute Canada document wiki, the four major supercomputers currently operating in Canada are
Organization |
Super Computer |
Nodes |
Cores |
Storage |
---|---|---|---|---|
Calcul Québec |
Béluga |
872 |
34,880 |
11.6PB |
SHARCNET |
Graham |
1,261 |
41,548 |
15.3PB |
SciNet |
Niagara |
2,016 |
80,640 |
9.4PB |
WestGrid |
Cedar |
2,502 |
101,424 |
13.8PB |
node - means a single computer
core - means a single CPU (generally 32 or more per node)
The individual nodes (computers) are connect by a very fast, low-latency, low-congestion network (typically full data rate Infiniband) in order to ensure there are no barriers for programs that need multiple nodes to collaborate when solving problems.
Most of the individual nodes (computers) in a supercomputer are the same. On graham, for example, most nodes have
32 cores
125GB of memory
but some have
44 or 64 cores (for large threaded jobs),
250GB, 502GB or 3,022GB of memory (for large memory jobs), or
Pascal, Volta, or Turing GPUs (for GPGPU jobs)
to also enable computations that require these more specialized resources.
Storage¶
Each supercomputer has its own storage systems. All the nodes (computers) in a supercomputer share the same storage. Within a supercomputer, the folder you are under determines what storage system you are working with. There are three main storage systems
Storage Folder |
Group |
Expiration |
HPC |
Size |
Files |
---|---|---|---|---|---|
/home/ |
No |
No |
No |
50GB |
500K |
/scratch/ |
No |
60 days |
Yes |
20TB |
1,000K |
/project/ |
Yes |
No |
Yes |
1TB |
500K |
where size and file number limits are for graham (the other supercomputers have similar, but slightly different, limits). All the file systems but /scratch are backed up each day.
Our data policy is that the data belongs to the sponsor. This means that you loose access to files when your sponsor stops sponsoring you (e.g., you graduate and move on, you have a falling out, etc.), so make sure to keep a person copy of anything you really care about too.
Running programs¶
The Canadian supercomputers, as with most around the world, run Linux. Linux is an open source operating started by a Finish Computer Science student named Linus in 1991. It is based on the Unix standard (Linux is a composition of the words Linus and Unix) and is now developed collaboratively by many people from all over the Internet.
Linux is not Windows, and Windows programs will not, for the most part, run on Linux. To run a program on the supercomputers, it needs to be a Linux program, or you need to have the source code so you can build the program yourself for Linux.
The good news is that we have already done this for almost the common programs that our users use. The Compute Canada documentation wiki contains an extensive list of the programs we have already installed along with any specific information required to run a program, such as licensing configuration for commercial software.
The other thing that is important to know about the supercomputers is they are not a free for all. You do not pick a node (computer) at random and go run your program on it. Rather you tell the system what you want to run and what resources it needs (how much time, how much memory, etc.). It will then run your program in such away to ensure it has access to all the resources it needs, that it does not interfere with other running programs, and that no one hogs the system.
We will cover how to schedule and mange your jobs later in this course.
Support¶
Support is provided through a single email address support@computecanada.ca Emailing this address will open a ticket on the Compute Canada support website and our staff will direct your question to the support individual most suitable for helping you across Canada. So long as it isn’t later in the afternoon, response time is generally the same day.
Many institutions, such as Western, also have local staff on site, such as myself. You can also contact us directly, although we generally prefer you to use the support email as this permits us to better mange and direct queries (e.g., taking into account the current workload of our staff in order to get you the fastest answer back).