Intermediate awk: An interactive guide to awk’s arrays and loops
awk
is a language that takes whitespace separated input files (columns), matches them against patterns, and executes
code for each match.
awk
is available on almost every single linux system.
But you already new that. Because you’ve done the “Basic awk: an interactive introduction to awk” tutorial already. We’re diving right in and I won’t be re-explaining things from basic awk. Fair warning.
Here’s the some similar earnings data to last time. People are listed multiple times.
earnings.txt
This time we’ll be writing longer awk programs so we’ll run our awk
from .awk
files (think .c .py .js, .rs) with awk -f file.awk input.txt
.
Edit the .awk
file and click the
command in the textbox when you’re ready (or mac: cmd+enter
win: ctrl+enter
).
You can also view my solution by clicking
The first challenge is a tiny review. Print the entire row $0
if the name is “Frances-Spence”
awk
arrays are dictionaries where keys can be anything (though they are stringified) and values can also be anything.
Like all awk
variables arrays require no initialization.
If you wanted to add the number in column 2 under the name in column 1 you could do this sums[$1] += $2
.
Try it out. Sum the earnings (column 2) of each person. At the end print the total earnings of Moondog arr["Moondog"]
. We’ll go over how to loop
over everyone’s earnings next. (Note: you might want to use an END
pattern here)
Okay fine. You summed them. Let’s print them all. awk
has for-each syntax. It looks like this.
for (key in arr) { print key " " arr[key] }
Now let’s have you print everyone’s name and their total using the for syntax (separated by a single space).
Good good. Okay now can you use a temporary variable to find the person with the highest total? This will require
combining for (key in arr)
and if statements like if (val > max) { max = val }
Arrays can of course also uses numbers as indices.
I’m going to skip over explaining the for loop syntax because it’s just like many other languages except with no type on i
.
Next up, I’m going to give you an array. Your job is to loop through it and at each index print the index, a space, and the running total (inclusive) thus far.
There are two more important things we can do with arrays in awk
. Ask if they contain a key if (key in arr) {} else {}
and delete a key/value delete arr[key]
.
Let’s use delete
and in
to calculate the primes from 1 to 100. We’ll use the prime sieve method.
If you don’t know what that is go read the wikipedia page and come back.
….
Okay welcome back. Use delete
to remove every non-prime. After removing all the non-primes loop from 0 to 100 and use something like if (number in primes)
to print
only the remaining numbers.
You may have noticed we’re not even using the source files anymore. awk
is a full language that can be used independently of tabular data.
Though it definitely shines on tabular data and I don’t suggest writing too complex a program in awk
.