Intermediate awk: An interactive guide to awk’s arrays and loops

awk is a language that takes whitespace separated input files (columns), matches them against patterns, and executes code for each match. awk is available on almost every single linux system.

But you already new that. Because you’ve done the “Basic awk: an interactive introduction to awk” tutorial already. We’re diving right in and I won’t be re-explaining things from basic awk. Fair warning.

Here’s the some similar earnings data to last time. People are listed multiple times.


earnings.txt

This time we’ll be writing longer awk programs so we’ll run our awk from .awk files (think .c .py .js, .rs) with awk -f file.awk input.txt.

Edit the .awk file and click the command in the textbox when you’re ready (or mac: cmd+enter win: ctrl+enter).

You can also view my solution by clicking

The first challenge is a tiny review. Print the entire row $0 if the name is “Frances-Spence”

exercise-1.awk
BLAH

awk arrays are dictionaries where keys can be anything (though they are stringified) and values can also be anything. Like all awk variables arrays require no initialization.

If you wanted to add the number in column 2 under the name in column 1 you could do this sums[$1] += $2.

Try it out. Sum the earnings (column 2) of each person. At the end print the total earnings of Moondog   arr["Moondog"]  . We’ll go over how to loop over everyone’s earnings next. (Note: you might want to use an END pattern here)

exercise_2.awk
BLAH

Okay fine. You summed them. Let’s print them all. awk has for-each syntax. It looks like this.

for (key in arr) { print key " " arr[key] }

Now let’s have you print everyone’s name and their total using the for syntax (separated by a single space).

exercise_3.awk
BLAH

Good good. Okay now can you use a temporary variable to find the person with the highest total? This will require combining for (key in arr) and if statements like if (val > max) { max = val }

exercise_4.awk
BLAH

Arrays can of course also uses numbers as indices. I’m going to skip over explaining the for loop syntax because it’s just like many other languages except with no type on i.

loop_example.awk
BLAH

Next up, I’m going to give you an array. Your job is to loop through it and at each index print the index, a space, and the running total (inclusive) thus far.

exercise_5.awk
BLAH

There are two more important things we can do with arrays in awk. Ask if they contain a key if (key in arr) {} else {} and delete a key/value   delete arr[key].

Let’s use delete and in to calculate the primes from 1 to 100. We’ll use the prime sieve method. If you don’t know what that is go read the wikipedia page and come back.

….

Okay welcome back. Use delete to remove every non-prime. After removing all the non-primes loop from 0 to 100 and use something like if (number in primes) to print only the remaining numbers.

exercise_6.awk
BLAH

You may have noticed we’re not even using the source files anymore. awk is a full language that can be used independently of tabular data. Though it definitely shines on tabular data and I don’t suggest writing too complex a program in awk.