Main image of article awk, Your Programmable Report Generator

As the World of Linux gets ever more sophisticated, I occasionally like to remind myself about the importance of the fundamentals. Back to early principles and concepts that let humans bend those mighty computing machines to their will. One such early idea was that of the command line and all the helpful little programs you'd type in, before the days of window managers and GUIs. One of my favorite command line programs is awk. awk is a powerful text manipulation program. Henry McGilton and Rachel Morgan, in Introducing The Unix System (McGraw-Hill 1983) referred to it as “a programmable report-generator”. With it you can search for patterns in text and/or perform relationship testing. Input is either a text file or some type of text stream, possibly originating from another command like ls.

Using awk

In its simplest form, awk simply prints out fields that you specify as it works its way through the file.  For example, say I generate a copy of my current directory and send it to a file with the following command: rob$  ls -l > lines.txt The contents of the file might look like this. -rw-r--r--  1 rob  rob       24735 2013-02-18 16:37 0001036647.PDF drwxr-xr-x  6 rob  rob        4096 2013-02-18 16:46 Calibre Library -rw-r--r--  1 rob  rob     4047331 2012-06-20 21:04 capt0000.jpg -rw-------  1 rob  rob     8064327 2011-07-02 04:12 capt0000.nef drwxr-xr-x  2 rob  rob        4096 2013-06-06 19:14 captivate-06062012 drwxr-xr-x  2 rob  rob        4096 2012-06-06 19:14 captivate-06062013 -rw-r--r--  1 rob  rob        5729 2011-06-13 12:12 writing.tjp~ -rw-r--r--  1 rob  rob      151552 2011-12-23 16:49 x264_2pass.log.temp drwxr-xr-x  2 rob  rob        4096 2012-12-29 15:09 xformerroot -rw-r--r--  1 rob  rob        8871 2013-01-28 17:01 X-Plane Installer Log.txt drwxr-xr-x  2 rob  rob        4096 2012-02-19 11:29 youtube We could use awk without any search patterns with the following command: rob$  awk '/ /' lines.txt The result is simply all the lines and fields in the file: -rw-r--r--  1 rob  rob       24735 2013-02-18 16:37 0001036647.PDF drwxr-xr-x  6 rob  rob        4096 2013-02-18 16:46 Calibre Library -rw-r--r--  1 rob  rob     4047331 2012-06-20 21:04 capt0000.jpg -rw-------  1 rob  rob     8064327 2011-07-02 04:12 capt0000.nef drwxr-xr-x  2 rob  rob        4096 2013-06-06 19:14 captivate-06062012 drwxr-xr-x  2 rob  rob        4096 2012-06-06 19:14 captivate-06062013 -rw-r--r--  1 rob  rob        5729 2011-06-13 12:12 writing.tjp~ -rw-r--r--  1 rob  rob      151552 2011-12-23 16:49 x264_2pass.log.temp drwxr-xr-x  2 rob  rob        4096 2012-12-29 15:09 xformerroot -rw-r--r--  1 rob  rob        8871 2013-01-28 17:01 X-Plane Installer Log.txt drwxr-xr-x  2 rob  rob        4096 2012-02-19 11:29 youtube

Searching with awk

Let's get a little more complex. This time, add a pattern to find a string anywhere in the lines.txt file. rob$  awk '/2012/' lines.txt The output looks like this. -rw-r--r--  1 rob  rob     4047331 2012-06-20 21:04 capt0000.jpg drwxr-xr-x  2 rob  rob        4096 2013-06-06 19:14 captivate-06062012 drwxr-xr-x  2 rob  rob        4096 2012-06-06 19:14 captivate-06062013 drwxr-xr-x  2 rob  rob        4096 2012-12-29 15:09 xformerroot drwxr-xr-x  2 rob  rob        4096 2012-02-19 11:29 youtube For the moment, let's switch gears and print out a couple of specific fields. Suppose we just want the dates and their associated file names. In the lines.txt file, those would be the sixth and the eighth fields. Use the built-in field matching feature and print them (with a space inserted, using double quotes, in between, for clarity). rob$  awk '{print $6 “  “ $8}' lines.txt Here are the corresponding lines. 2013-02-18 0001036647.PDF 2013-02-18 Calibre 2012-06-20 capt0000.jpg 2011-07-02 capt0000.nef 2013-06-06 captivate-06062012 2012-06-06 captivate-06062013 2011-06-13 writing.tjp~ 2011-12-23 x264_2pass.log.temp 2012-12-29 xformerroot 2013-01-28 X-Plane 2012-02-19 youtube Note that you can also use characters or a string in between the double quotes, just the same as a space. Remember that I'm using a pretty small lines.txt file. Your lines.txt file could be 10 MB in size. awk would handle that file without a problem. It just starts at the beginning and chugs through until it reaches the end, finding, pattern matching and printing as it goes. Next, combine the pattern search and field selection into a command. This time just select the file name, field 8. awk '/2012/ {print $8}' lines.txt capt0000.jpg captivate-06062012 captivate-06062013 xformerroot youtube What the heck? We have a 2013 in the output! Don't forget that /2012/ matches lines and fields anywhere in the file. Take a look when we use both the number 6 and number 8 fields. awk '/2012/ {print $6 “ “ $8}' lines.txt And the output. 2012-06-20 capt0000.jpg 2013-06-06 captivate-06062012 2012-06-06 captivate-06062013 2012-12-29 xformerroot 2012-02-19 youtube There's the 2012. I only mention it because this kind of confusing situation is easy to create but sometimes tough to spot.

Searching by Relationship

A simple example of performing a relationship test might be the following. awk '$6 > "2012-06-06" {print $6 "  " $8}' lines.txt 2013-02-18  0001036647.PDF 2013-02-18  Calibre 2012-06-20  capt0000.jpg 2013-06-06  captivate-06062012 2012-12-29  xformerroot 2013-01-28  X-Plane Meantime, using a less-than comparison yields a different result. awk '$6 < "2012-06-06" {print $6 "  " $8}' lines.txt 2011-07-02  capt0000.nef 2011-06-13  writing.tjp~ 2011-12-23  x264_2pass.log.temp 2012-02-19  youtube

Conclusion

All those crazy database tools, word processors and such are great, but sometimes you just need something simple and fast. Command line tools like awk are the answer. awk’s a great program for quick reports or generating reports from long text files. It has a bunch of options and many programmable features-- a few of which we'll discuss in future stories. Take a look at awk and I'm sure you'll see many different opportunities to use this powerful tool on your Linux command line.