Key Features

Unix Architecture page on Wikipedia

Files are stored on disk in a hierarchical file system, with a single top location throughout the system (root, or “/”), with both files and directories, subdirectories, sub-subdirectories, and so on below it.

With few exceptions, devices and some types of communications between processes are managed and visible as files or pseudo-files within the file system hierarchy. This is known as everything’s a file.

Doug McIlroy (inventor of Unix pipes)

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface

File system

Key differences from Windows

  • there are mount points instead of A:, C:, etc.,

  • directories and files are case sensitive, and

  • the separation character is / instead of \

What would appear as a separate media hierarchy in Windows (e.g., A:\MyDir\MyCode.c) simply appears under a separate directory (known as a mount point) in Unix (e.g., /media/disk/MyDir/MyCode.c).

Root (/)

  • /boot - boot loader files

  • /etc- configuration files

  • /dev - device files

  • /bin - user programs required for booting

  • /sbin - system programs required for booting

  • /lib{,32,64} - libraries required for booting

  • /usr - programs, libraries, and such not required for booting

  • /root - superuser directory

  • /home - users directories (shared by all nodes)

  • /tmp - temporary files

  • /var - variable data (spool files, log files, etc.)

  • /opt - add on package directory

  • /media - mount point for removable media

  • /proc - process information pseudo-file system

  • /sys - system information pseudo-file system

User (/usr)

The /usr directory is split off from the / directory mostly because disk space used to be precious.

  • /usr/bin - user programs not required for booting

  • /usr/sbin - system programs not required for booting

  • /usr/lib{,32,64} - libraries not required for booting

  • /usr/games - game programs

  • /usr/share - architecture independent data

  • /usr/man - on-line manuals

  • /usr/src - source code

  • /usr/include - header files

User Local (/usr/local)

The /usr/local directory is a place to locally install programs without messing up /usr.

  • /usr/bin - user programs not required for booting

  • /usr/sbin - system programs not required for booting

  • /usr/lib{,32,64} - libraries not required for booting

  • /usr/games - game programs

  • /usr/share - architecture independent data

  • /usr/man - on-line manuals

  • /usr/src - source code

  • /usr/include - header files

Compute Canada

  • /project - group data files (shared by all nodes and all group members)

  • /scratch - user temporary data files (local to each cluster)

Devices

Some of the special /dev files are

  • /dev/null - discards all data written and provides no data

  • /dev/zero - provides a constant stream of NULL characters

  • /dev/random - provides a stream of random characters

  • /dev/urandom - provides a constant stream of pseudo-random characters

Commands

Programs are run by specifying the command followed by the arguments separated by spaces.

program [argument]

By convention, arguments are switches followed by strings (e.g., regexps, paths, file names, etc.). Switches are usually single dashes followed by letter for each switch or a double dash followed by a descriptive string (e.g., rm -fr mydir or rm --force --recurse mydir). Most commands also understand

  • - - as a file name means read or write to the terminal

  • -- - the end of switches and the start of the strings (in case the string needs to start with - or --).

Help

Traditionally man pages (a single help page) have been the de facto documentation source, however, some software suites have been switching to info pages (a collection of hyperlinked pages). Help for the shell built in commands is available by the built in help.

  • man command - on-line reference manuals

  • apropos [-a] keyword … - search on-line reference manuals (same as man -k)

  • info item - info documents

Directories

The current directory is . and the parent directory is ...

  • pwd - current directory

  • cd directory - change directory

  • mkdir directory - make directory

  • rmdir directory - remove directory

Files

Files beginning with . are considered hidden and not normally shown.

  • ls [-a] [-l] destination - list files

  • cp [-a|-p] [-r] [-s] sourcedestination - copy files

  • ln [-s] target name - link to file

  • mv sourcedestination - move files

  • rm [-r] [-f] destination … - remove files

Permissions

Standard permissions are read, write, and execute for user, group, and other. They are frequently abbreviated as three octal numbers (0=000, 1=001, 2=010, 3=011, 4=100, 5=101, 6=110, 7=111) corresponding to user read, write, and execute; group read, write, and execute; other read, write, and execute.

For directories, read allows the contents to be listed, write allows files to be added or removed, and execute allows the directory to be traversed.

  • chmod [u|g|o|a][+|-|=][r|w|x|X][-R] destination … - change mode (user/group/other permissions)

  • chown [-R] user destination … - change owner

  • chgrp [-R] group destination … - change group

  • setfacl [-m|-x] [-R] [[u|g|o|m]...:user:[r|w|x|X]...] destination* … - set file access control list *(individual users)

  • getfacl destination … - get file access control list (individual users)

View Files

The space key will advance a page and the q key will quit in more and less. In addition, the arrow keys will move in the appropriate direction in less.

  • more file - view one page at a time

  • less file - view forward and backwards

  • cat [file] - concatenate files in sequence

  • head [-n lines] [file] - first part of files

  • tail [-n lines] [-f] [file] - last part of files

  • paste [-d deliminator] [file] - concatenate files in parallel

  • cut [-d deliminator] [-f range] [file] - extract columns

  • sort [-g] [-f] [-u] [file] - sort lines

Comparison

Digests are numbers computed from the content of files such that it is extremely difficult to come up with two different files with the same number.

  • diff [-w] [-i] [-u number|-y] file1 file2 - compare files line by line

  • sdiff [-W] file1 file2 - compare files side by side (similar to diff -y)

  • md5sum [file] - compute MD5 digest

  • sha256sum [file] - compute SHA256 digest

Searching

  • egrep [-i] [-v] regexp [file] - find lines matching regexp in files (same as grep -E)

  • fgrep [-i] [-v] strings [file] - find lines matching strings in files (same as grep -F)

  • find directorypredicates - find files satisfying predicates in directories

Process

Each process (a running programs) is identified by a unique number.

  • ps [-A|-U user] [-H] [-f] - process list

  • kill [-s signal] process … - signal process

  • nohup command - disconnect command

  • nice command - low priority command

Remote

  • ssh [user@]host [command] - login to remote system

  • scp [[user@]host:] source[[user@]host:]destination - copy remote files

  • unix2dos file … - convert to DOS line breaks

  • dos2unix file … - convert to Unix line breaks

Other

  • sleep seconds - waits given number of seconds

  • echo [-n] [-e] strings - prints strings

  • test tests - perform various string (e.g., equality) of file (e.g., existence) tests

Editors

The two most popular Unix editors are vi and emacs. Both are extremely powerful and very complex. A simpler editor is nano.

  • vi [file] - common Unix editor

  • emacs [-nw] [file] - common Unix editor

  • nano [file] - simple Unix editor

Vi

Vi distinguishes between command and insert mode. Command mode allows you to move around and enter commands. Insert mode allows you to edit text.

  • :h - help

  • :w[!] [file] - write file (excalmation forces it)

  • :e file - edit file

  • :q[!] - quit Vi (exclamation forces it)

  • :n[!] - next file (excalmation forces it)

  • [a|A] - append after cursor or at end of line

  • [i|I] - insert (capital for beginning of line)

  • [v|V] - select to cursor or to end of line

  • [c[w|c]|C] - change selection/word/line or to end of line

  • [d[w|d]|D] - delete selection/word/line or to end of line

  • [y[w|y]|Y] - copy selection/word/line or to end of line

  • [p|P] - paste before or after cursor/line

  • J - join lines

  • [u|U] - undo (capital for current line)

  • ESC - revert to command mode

Emacs

Emacs is a more traditional single mode editor. Partially typed entries can be completed by pressing TAB (twice to list).

  • CTRL+h - help (b list keys and k describes keys)

  • CTRL+g - abort current operation

  • CTRL+[1|2|3] - single window or split vertical/horizontal window

  • CTRL+x CTRL+s - save current buffer

  • CTRL+x CTRL+b - switch current buffer

  • CTRL+x CTRL+k - quit current buffer

  • CTRL+x CTRL+c - quit Emacs

  • CTRL+SPACE - mark start of region

  • CTRL+w - copy from start of region to cursor

  • CTRL+y - past copied region

  • CTRL+k - delete to end of line or line if start of line

  • CTRL+s - search for text

  • CTRL+_ - undo

  • ALT+x - enter command (TAB twice to list)

Command Line

The shell is a command line interpreter that lets users run programs. It proves ways to start programs and to manipulate/setup the context in which they run. The main parts of this are

  • arguments,

  • environment,

  • standard input (stdin),

  • standard output (stdout),

  • standard error (stderr), and

  • return value

A standard command looks like so

command [<stdinfile] [>[>]stdoutfile] [2>[>]stderrfile] [&]

Arguments

Options passed to the program to tweak it’s behaviour. Traditionally switches (e.g., -xzf or --extract --gzip --file) followed by strings (e.g., regexp, paths, file names, etc.). Partially typed file names and directories can be completed by pressing TAB (twice to list).

  • {}… (brace expansion) - if not quoted, expands once for each comma separated list or once for each number in .. separated range

  • ~… (tilde expansion) - if not quoted, expands to home directory of user following the tilde or the current user if no user specified

  • ${...} (parameter and variable expansion) - if not single quoted, expands to environment variable specified or the corresponding parameter if number specified ({ and } are not always required)

  • $(...) (command substitution) - if not single quoted, expands to output for command (`` is an alternative syntax)

  • $((...)) (arithmetic substitution) - if not single quoted, expands to evaluated result of the expression

  • … (word splitting) - if not quoted, splits into separate arguments anywhere an IFS character (by default space, tab, and newline) occurs

  • [*|?|[]]… (path name expansion) - if not quoted, is considered a pattern and replaced with matching file names (* matches any string, ? matches any character, and [] matches all the enclosed characters)

Quoting

Special characters can be escaped with \ to remove their special meaning. Single and double quoting strings affect escaping as well as which expansions and substitutions are preformed.

  • '' - no expansion or substitutions is preformed

  • "" - only escaping, parameter and variable expansion, command substitutions, and arithmetic substitutions occur

Environment

A set of key value pairs (e.g., USER=root) that programs can look up and use. Each program gets a fresh copy (i.e., changing it will not change the original) of all environment variables marked for export.

  • key=value - make a local environment variable

  • export key[=value] - mark an environment variable for export

  • unset key - delete an environment variable

Two important environment variables are

  • PATH - list of : separated directories to look for programs in

  • LD_LIBRARY_PATH - list of : separated directories to look for libraries in (ahead of the system defaults specified in /etc/ld.so.conf)

Input and Output

Programs are run with a standard place to read input from, a standard place to write output to, and a standard place to write error messages to. By default these are all the terminal window in which the program is run. This can be changed via

  • < file - read standard input from file

  • [>|>>] file - write standard output to file (overwriting or appending)

  • [2>|2>>] file - write standard error to file (overwriting or appending)

  • [&>|&>>] file - write standard output and error to file (overwriting or appending)

Status

Programs return an integer exit status. The stats of the most recent executed foreground command is available as $?.

  • 0 - program completed successfully

  • 1…127 - program specific error code

  • 128…255 - program terminated by signal 127+signal

Job Control

Programs run in the foreground by default. Background jobs will suspended if they require input. Existing jobs will be sent SIGHUP when the shell exits.

  • jobs - list jobs

  • fg id - switch job to foreground

  • bg id … - switch jobs to background

  • disown id … - release jobs from job control

Foreground jobs usually respond to the following key combinations

  • CTRL+Z - suspend program

  • CTRL+C - abort program

  • CTRL+D - end of input

Multiple Commands

Commands can be combined in several ways.

  • ; … - run first command and then second (same as pressing ENTER)

  • & … - run first command in background at the same time as second

  • | … - run first command in background with its output going to the second as input

  • && … - run first command and then second only if first returns success

  • || … - run first command and then second only if first returns failure

Commands can be combined in several ways.

  • {} - group command in current shell – has to end with ; or newline

  • () - group command in sub shell – does not have to end with ; or newline

Scripting

Executable text files that start with #!command (#!/bin/bash for shell scripts) are run as command file.

Parameters

  • $# - number of parameters

  • $0 - name of shell or shell script

  • $number - positional parameter

  • $* - all positional parameters (in double quotes expands as one argument)

  • $@ - all positional parameters (in double quotes expands as separate arguments)

The following functions manipulate parameters

  • shift [number] - drop specified number of parameters (one if unspecified)

  • set parameter … - set parameters to given parameters

Programming

  • if command; then command; [elif command; then command;][else command;] fi - conditionally run commands depending on success if and elif commands

  • for key in value; do command; done - for each value, set key to value and run commands

  • while command; do command; done - repeatedly run commands until while commands fail

  • case value in [pattern [| pattern]) command;;]esac - run commands where first pattern matches (same as path name expansion)

  • continue [number] - next iteration of enclosed loop (last if not specified)

  • break [number] - exit enclosed loop (last if not specified)

  • function name { command; } … - create a command that runs the commands with passed parameters

  • return [number] - return from function with given exit status (last command if not specified)

  • exit [number] - quit shell with given exit status (last command if not specified)

Regular Expressions

Regular expressions are strings where several of the non-alphanumeric characters have special meaning. They provide a concise and flexible means for string searching and replacing and are used by several Unix programs.

Anchoring

  • ^ - match start of line

  • $ - match end of line

Characters

  • character - the indicated character

  • . - any character

  • [] - any character in the list or range (^ inverts)

Combining

  • () - group

  • |… - match either or

Repetition

  • ? - match zero or one times

  • * - match zero or more times

  • + - match one or more times

  • {} - match a range of times

Replacement

  • \digit - substitute text matched by corresponding group