Awk by Example -

Here are some examples of GNU Awk, re-implementing common command line tools in Awk code one-liners to illustrate how Awk works. First line is the original tool, second line is the Awk equivalent. Obviously you would not do these tasks in Awk, rather you would combine these techniques into more complex actions difficult or impossible in the shell.

Awk is a text processing language and tool that sits somewhere between common Unix command line tools and full-blown languages like Python and Ruby. It’s ideal for quick hacks that are difficult to achieve without complex shell pipelines or full scripts, and for many tasks it’s extremely fast.

The language structure is a series of statements of the form pattern { action }; actions are executed if the pattern matches a line. A null pattern matches every line, a null action prints the line.

Matching certain lines

Search for a regexp: a pattern without an action means the default action: print; grep foo file; awk '/foo/' file
First 10 lines: patterns can be any expression; NR is the current line number; head -n 10 file; awk 'NR <= 10' file
Lines 11 through 20: more complex expression; tail -n +11 file | head -n 10; awk 'NR >= 11 && NR <= 20' file
Multiple positive and negative regexs: the Awk version is a single process; grep foo file | grep -v bar | grep baz; awk '/foo/ && !/bar/ && /baz/' file
Last lines: Awk must buffer 10 lines to be able to print them at the end; $0 is the entire line; the special pattern END runs after all input; tail -n 10 file; awk '{ buf[NR%10] = $0 } END { for (i = NR-9; i <= NR; i++) { print buf[i%10] } }' file
Numeric condition in a single field: expressions can use any valid variable; $x is field number x; fields are separated by runs of whitespace and leading whitespace is ignored; (not possible in a single shell line); awk -F: '$4 > 5 && $4 < 100' /etc/passwd
Match a regex in a single field: (complex regex for grep); awk -F: '$5 ~ /foo/' /etc/passwd
Select lines with at least 5 fields: NF is the number of fields in the current line; (not possible in a single shell line); awk 'NF >= 5'
Choose a random line: srand() needed to properly initialize the random number generator; the special BEGIN action runs before any input; shuf -n 1 file; awk 'BEGIN { srand() } rand() < 1/NR { l = $0 } END { print l }' file

Modifying output

Extract one column: an action without a pattern is run for every line; -F separates by the given single character or regex, runs are not compressed; cut -d: -f6 /etc/passwd; awk -F: '{ print $6 }' /etc/passwd
Extract multiple columns: cut -d: -f4,6 /etc/passwd; awk -F: '{ print $4 ":" $6 }' /etc/passwd
Replace a regex with another value: sed 's/foo/bar/' file; awk '{ gsub(/foo/, "bar"); print }' file
Replace a regex with another value in a single field: (needs a complex regex in sed); awk '{ gsub(/foo/, "bar", $5); print }' file
Print file with line numbers: print always adds a newline but printf needs it explicitly; cat -n file; awk '{ printf("%6d %s\n", NR, $0)}' file
Coerce a field to numeric: need to set the output field separator to preserve the file format; (not possible in a single shell line); awk -F: 'BEGIN { OFS = FS } { $5 = 0 + $5; print }' file
Write lines in reverse order (caution: memory): tac file; awk '{ buf[++l] = $0 } END { for (i = l; i > 0; --i) { print buf[i] } }' file
Show uniq lines in sorted input: sort file | uniq or sort -u file; sort file | awk 'prev != $0 { print; prev = $0 }'
Show only uniq lines in unsorted input (caution: memory): (not possible in a single shell line); awk '!seen[$0]++' file; This needs some explanation; uses seen as an associative array with lines as keys; seen[$0]++ is >0 and thus true if the line has been seen before, the preceding ! inverts the condition so nothing is printed. The first time a line is seen it is null (and thus false), is incremented to 1 after the test, and the preceding ! inverts the condition causing it to be printed since that’s the default action. See below to show the counts.

Counting and summing

Count all lines: wc -l file; awk 'END { print NR }' file
Count words: these are only approximately the same; wc -w file; awk '{ t += NF } END { print t }' file
Count bytes: wc -c file; (not possible - GNU Awk works with encoded characters, not bytes)
Count all lines matching a regex: variables spring into existence as required, and take on numeric values in a numeric context; grep foo *.c | wc -l; awk '/foo/ { t++ } END { print t }' *.c
Count unique lines (caution: memory): all arrays are associative; the Awk version does not need to sort; sort -u file | wc -l; awk '{ seen[$0]++ } END { print length(seen) }' file
Count each unique line: the Awk version does not need to sort; sort file | uniq -c; awk '{ count[$0]++ } END { for (line in count) { printf("%6d %s\n", count[line], line) } }' file
Count lines by file: special ENDFILE action runs after each distinct input file, FNR is the line number within each file; wc -l *.c; awk 'ENDFILE { printf("%4d %s\n", FNR, FILENAME) } END { printf("%4d total\n", NR) }' *.c
Sum the fifth field, treating non-numbers as zero: *needs a complex loop in the shell*; `awk ‘{ t += 0 + $5 } END { print t } file’
Sum all numbers in a file: patsplit puts all values matching a regex into an array, similar to Python’s re.findall or Ruby’s String#scan; *not possible in a single line*; awk '{ patsplit($0, n, /[[:digit:]]+/); for (i in n) { t += n[i] } } END { print t }' file

I/O

Split a file into chunks of 100 lines with a numeric suffix for each line: line numbers start at 1; the > construct takes an expression used as a string; close to avoid running out of file descriptors; split -d -n 100 file foo; awk 'NR % 100 == 1 { close(fname); fname = sprintf("foo%02d", NR/5) } { print >fname }' file

Running external commands

Run a command for a column: cut -d: -f6 /etc/passwd | xargs ls -ld; awk -F: '{ system("ls -ld \"", $6, "\"") }' /etc/passwd

awk noscript

Matching certain lines

Modifying output

Counting and summing

I/O

Running external commands

See also