Awk by Example


Here are some examples of GNU Awk, re-implementing common command line tools in Awk code one-liners to illustrate how Awk works. First line is the original tool, second line is the Awk equivalent. Obviously you would not do these tasks in Awk, rather you would combine these techniques into more complex actions difficult or impossible in the shell.

Awk is a text processing language and tool that sits somewhere between common Unix command line tools and full-blown languages like Python and Ruby. It’s ideal for quick hacks that are difficult to achieve without complex shell pipelines or full scripts, and for many tasks it’s extremely fast.

The language structure is a series of statements of the form pattern { action }; actions are executed if the pattern matches a line. A null pattern matches every line, a null action prints the line.

Matching certain lines

Search for a regexp
a pattern without an action means the default action: print
grep foo file
awk '/foo/' file
First 10 lines
patterns can be any expression; NR is the current line number
head -n 10 file
awk 'NR <= 10' file
Lines 11 through 20
more complex expression
tail -n +11 file | head -n 10
awk 'NR >= 11 && NR <= 20' file
Multiple positive and negative regexs
the Awk version is a single process
grep foo file | grep -v bar | grep baz
awk '/foo/ && !/bar/ && /baz/' file
Last lines
Awk must buffer 10 lines to be able to print them at the end; $0 is the entire line; the special pattern END runs after all input
tail -n 10 file
awk '{ buf[NR%10] = $0 } END { for (i = NR-9; i <= NR; i++) { print buf[i%10] } }' file
Numeric condition in a single field
expressions can use any valid variable; $x is field number x; fields are separated by runs of whitespace and leading whitespace is ignored
(not possible in a single shell line)
awk -F: '$4 > 5 && $4 < 100' /etc/passwd
Match a regex in a single field
(complex regex for grep)
awk -F: '$5 ~ /foo/' /etc/passwd
Select lines with at least 5 fields
NF is the number of fields in the current line
(not possible in a single shell line)
awk 'NF >= 5'
Choose a random line
srand() needed to properly initialize the random number generator; the special BEGIN action runs before any input
shuf -n 1 file
awk 'BEGIN { srand() } rand() < 1/NR { l = $0 } END { print l }' file

Modifying output

Extract one column
an action without a pattern is run for every line; -F separates by the given single character or regex, runs are not compressed
cut -d: -f6 /etc/passwd
awk -F: '{ print $6 }' /etc/passwd
Extract multiple columns
cut -d: -f4,6 /etc/passwd
awk -F: '{ print $4 ":" $6 }' /etc/passwd
Replace a regex with another value
sed 's/foo/bar/' file
awk '{ gsub(/foo/, "bar"); print }' file
Replace a regex with another value in a single field
(needs a complex regex in sed)
awk '{ gsub(/foo/, "bar", $5); print }' file
Print file with line numbers
print always adds a newline but printf needs it explicitly
cat -n file
awk '{ printf("%6d %s\n", NR, $0)}' file
Coerce a field to numeric
need to set the output field separator to preserve the file format
(not possible in a single shell line)
awk -F: 'BEGIN { OFS = FS } { $5 = 0 + $5; print }' file
Write lines in reverse order (caution: memory)
tac file
awk '{ buf[++l] = $0 } END { for (i = l; i > 0; --i) { print buf[i] } }' file
Show uniq lines in sorted input
sort file | uniq or sort -u file
sort file | awk 'prev != $0 { print; prev = $0 }'
Show only uniq lines in unsorted input (caution: memory)
(not possible in a single shell line)
awk '!seen[$0]++' file
This needs some explanation; uses seen as an associative array with lines as keys; seen[$0]++ is >0 and thus true if the line has been seen before, the preceding ! inverts the condition so nothing is printed. The first time a line is seen it is null (and thus false), is incremented to 1 after the test, and the preceding ! inverts the condition causing it to be printed since that’s the default action. See below to show the counts.

Counting and summing

Count all lines
wc -l file
awk 'END { print NR }' file
Count words
these are only approximately the same
wc -w file
awk '{ t += NF } END { print t }' file
Count bytes
wc -c file
(not possible - GNU Awk works with encoded characters, not bytes)
Count all lines matching a regex
variables spring into existence as required, and take on numeric values in a numeric context
grep foo *.c | wc -l
awk '/foo/ { t++ } END { print t }' *.c
Count unique lines (caution: memory)
all arrays are associative; the Awk version does not need to sort
sort -u file | wc -l
awk '{ seen[$0]++ } END { print length(seen) }' file
Count each unique line
the Awk version does not need to sort
sort file | uniq -c
awk '{ count[$0]++ } END { for (line in count) { printf("%6d %s\n", count[line], line) } }' file
Count lines by file
special ENDFILE action runs after each distinct input file, FNR is the line number within each file
wc -l *.c
awk 'ENDFILE { printf("%4d %s\n", FNR, FILENAME) } END { printf("%4d total\n", NR) }' *.c
Sum the fifth field, treating non-numbers as zero
*needs a complex loop in the shell*
`awk ‘{ t += 0 + $5 } END { print t } file’
Sum all numbers in a file
patsplit puts all values matching a regex into an array, similar to Python’s re.findall or Ruby’s String#scan
*not possible in a single line*
awk '{ patsplit($0, n, /[[:digit:]]+/); for (i in n) { t += n[i] } } END { print t }' file

I/O

Split a file into chunks of 100 lines with a numeric suffix for each line
line numbers start at 1; the > construct takes an expression used as a string; close to avoid running out of file descriptors
split -d -n 100 file foo
awk 'NR % 100 == 1 { close(fname); fname = sprintf("foo%02d", NR/5) } { print >fname }' file

Running external commands

Run a command for a column
cut -d: -f6 /etc/passwd | xargs ls -ld
awk -F: '{ system("ls -ld \"", $6, "\"") }' /etc/passwd

See also