Basic awk
: An interactive introduction to awk
awk
is a language that takes whitespace separated input files (columns), matches them against patterns, and executes
code for each match.
awk
is available on almost every single linux system.
# For every line execute code if the pattern matches that line
pattern { code }
# Run code for every line
{ code }
Here’s an example of an awk command that just returns its input ($0
refers to the full source line). Click into the terminal and press enter
.
Here’s an example of data ready for awk to process ./mail_list
. You can edit this data and the terminals below will
use the new data.
Let’s try an easy example with no pattern. Printing the first column ($1
). (Press enter to run)
Next let’s print columns $1
and $2
separated by a space " "
That looks like this : $1 " " $2
print
will accept multiple arguments separated by
spaces (no plus signs here)
You’ll need to modify the code this time, adding “ “
Okay how about a pattern? You saw $1
means column one. How about printing the phone number for every Bill?
Next let’s try multiple patterns. In addition to printing all Bill’s phone numbers let’s print the name of the person with
the phone number 555-3430
.
pattern1 { code1 } pattern2 { code2 }
awk variables can be initialized in a BEGIN { x = 0 }
pattern or just default to 0.
Similarly the END
pattern matches once after all rows are complete. Thus far we’ve used plain { code }
with no begin nor end preceeding it.
These blocks run on every line.
Try running these two examples to get an idea of how BEGIN and END work.
Here’s an example where we add 5 to s for each line. awk also supplies a length()
function that can accept a column.
Can you sum the length of everyone’s name?
awk can also use regular expressions as patterns. You can match your regex against the entire line
/regex/ { code }
or against a column $1 ~ /regex/ { code }
.
Here’s a regex that matches any word containing only vowels /^[AEIOUYaeiouy]+$/
can use you use it to match names with
only vowels and print them?
Control flow! awk
has if
and else
like other languages. Here we have a dataset of names, ages, and countries.
Let’s try and use if else to print (senior) + the name of everyone whose age is over 65.
optionalPattern { if (something >= else) { do this } else { do that }}
# Output format:
(senior) Frances Spence
Nate
DojaCat
...
(senior) Jean-Bartik
Let’s try some logic! awk
supports logical and: &&
as well as logical or: ||
Try and use &&
and ||
to write a pattern that matches only seniors in the USA.
Next try seniors OR people in nigeria (NG).
How about summing up the number of seniors inside and outside of the USA? Just like we implicitly created variables
using { s += length($2) }
earlier we can create two new variables to count seniors in/out of the USA.
Try doing this two ways
- Matching every line with a senior and then using if/else on $3
- Using two patterns one that matches seniors in the USA and one that matches seniors not the USA
Multiple patterns looks like this
awk 'pattern1 { code1 } pattern2 { code2 } END { finalCode }' people
Your solution should be two numbers separated by a space 4 2
awk
has a few builtins, these are variables defined for you. Here are a few:
name | value |
---|---|
FS | Field separator (space in our examples) |
RS | Record separtor (newline here) |
NF | Number of columns (fields) |
NR | Index of current row (record) |
$0 | Full Line (all columns) |
See if you can use this to pull out only the odd rows from the people dataset. (awk
supports %
and /
)
When you’re using awk from the command line you’ll also have access to flags (we can’t use them easily here on the web). A few flags worth knowing are
flag | example | purpose |
---|---|---|
F | awk -F: | Columns are separated by a colon `:` |
f | awk -f script.awk | Load awk script from a file instead of the command line |
v | awk -v init=1 | the variable init begins as 1 instead of the default 0
Equivalent to awk 'BEGIN { init = 1 } ... |
That’s all I have for you today! If you have ideas for what you’d like to see in an intermediate interactive awk
guide, shoot me an email (on homepage).
-Nate
Licensing notes:
Some examples are pulled from the GNU awk users guide under the GNU Free Documentation License
awkjs is used under the MIT license