Here are some examples of GNU Awk, re-implementing common command line tools in Awk code one-liners to illustrate how Awk works. First line is the original tool, second line is the Awk equivalent. Obviously you would not do these tasks in Awk, rather you would combine these techniques into more complex actions difficult or impossible in the shell.
Awk is a text processing language and tool that sits somewhere between common Unix command line tools and full-blown languages like Python and Ruby. It’s ideal for quick hacks that are difficult to achieve without complex shell pipelines or full scripts, and for many tasks it’s extremely fast.
The language structure is a series of statements of the form pattern { action }
; actions are executed if the pattern matches a line. A null pattern matches every line, a null action prints the line.
Matching certain lines
- Search for a regexp
- a pattern without an action means the default action: print
grep foo file
awk '/foo/' file
- First 10 lines
- patterns can be any expression;
NR
is the current line number head -n 10 file
awk 'NR <= 10' file
- Lines 11 through 20
- more complex expression
tail -n +11 file | head -n 10
awk 'NR >= 11 && NR <= 20' file
- Multiple positive and negative regexs
- the Awk version is a single process
grep foo file | grep -v bar | grep baz
awk '/foo/ && !/bar/ && /baz/' file
- Last lines
- Awk must buffer 10 lines to be able to print them at the end;
$0
is the entire line; the special patternEND
runs after all input tail -n 10 file
awk '{ buf[NR%10] = $0 } END { for (i = NR-9; i <= NR; i++) { print buf[i%10] } }' file
- Numeric condition in a single field
- expressions can use any valid variable;
$x
is field number x; fields are separated by runs of whitespace and leading whitespace is ignored (not possible in a single shell line)
awk -F: '$4 > 5 && $4 < 100' /etc/passwd
- Match a regex in a single field
(complex regex for grep)
awk -F: '$5 ~ /foo/' /etc/passwd
- Select lines with at least 5 fields
NF
is the number of fields in the current line(not possible in a single shell line)
awk 'NF >= 5'
- Choose a random line
srand()
needed to properly initialize the random number generator; the specialBEGIN
action runs before any inputshuf -n 1 file
awk 'BEGIN { srand() } rand() < 1/NR { l = $0 } END { print l }' file
Modifying output
- Extract one column
- an action without a pattern is run for every line;
-F
separates by the given single character or regex, runs are not compressed cut -d: -f6 /etc/passwd
awk -F: '{ print $6 }' /etc/passwd
- Extract multiple columns
cut -d: -f4,6 /etc/passwd
awk -F: '{ print $4 ":" $6 }' /etc/passwd
- Replace a regex with another value
sed 's/foo/bar/' file
awk '{ gsub(/foo/, "bar"); print }' file
- Replace a regex with another value in a single field
(needs a complex regex in sed)
awk '{ gsub(/foo/, "bar", $5); print }' file
- Print file with line numbers
print
always adds a newline butprintf
needs it explicitlycat -n file
awk '{ printf("%6d %s\n", NR, $0)}' file
- Coerce a field to numeric
- need to set the output field separator to preserve the file format
(not possible in a single shell line)
awk -F: 'BEGIN { OFS = FS } { $5 = 0 + $5; print }' file
- Write lines in reverse order (caution: memory)
tac file
awk '{ buf[++l] = $0 } END { for (i = l; i > 0; --i) { print buf[i] } }' file
- Show uniq lines in sorted input
sort file | uniq
orsort -u file
sort file | awk 'prev != $0 { print; prev = $0 }'
- Show only uniq lines in unsorted input (caution: memory)
(not possible in a single shell line)
awk '!seen[$0]++' file
- This needs some explanation; uses
seen
as an associative array with lines as keys;seen[$0]++
is >0 and thus true if the line has been seen before, the preceding!
inverts the condition so nothing is printed. The first time a line is seen it is null (and thus false), is incremented to 1 after the test, and the preceding!
inverts the condition causing it to be printed since that’s the default action. See below to show the counts.
Counting and summing
- Count all lines
wc -l file
awk 'END { print NR }' file
- Count words
- these are only approximately the same
wc -w file
awk '{ t += NF } END { print t }' file
- Count bytes
wc -c file
(not possible - GNU Awk works with encoded characters, not bytes)
- Count all lines matching a regex
- variables spring into existence as required, and take on numeric values in a numeric context
grep foo *.c | wc -l
awk '/foo/ { t++ } END { print t }' *.c
- Count unique lines (caution: memory)
- all arrays are associative; the Awk version does not need to sort
sort -u file | wc -l
awk '{ seen[$0]++ } END { print length(seen) }' file
- Count each unique line
- the Awk version does not need to sort
sort file | uniq -c
awk '{ count[$0]++ } END { for (line in count) { printf("%6d %s\n", count[line], line) } }' file
- Count lines by file
- special
ENDFILE
action runs after each distinct input file,FNR
is the line number within each file wc -l *.c
awk 'ENDFILE { printf("%4d %s\n", FNR, FILENAME) } END { printf("%4d total\n", NR) }' *.c
- Sum the fifth field, treating non-numbers as zero
*needs a complex loop in the shell*
- `awk ‘{ t += 0 + $5 } END { print t } file’
- Sum all numbers in a file
patsplit
puts all values matching a regex into an array, similar to Python’sre.findall
or Ruby’sString#scan
*not possible in a single line*
awk '{ patsplit($0, n, /[[:digit:]]+/); for (i in n) { t += n[i] } } END { print t }' file
I/O
- Split a file into chunks of 100 lines with a numeric suffix for each line
- line numbers start at 1; the
>
construct takes an expression used as a string;close
to avoid running out of file descriptors split -d -n 100 file foo
awk 'NR % 100 == 1 { close(fname); fname = sprintf("foo%02d", NR/5) } { print >fname }' file
Running external commands
- Run a command for a column
cut -d: -f6 /etc/passwd | xargs ls -ld
awk -F: '{ system("ls -ld \"", $6, "\"") }' /etc/passwd