awk command in unix

581
awk command in unix

Here we can see, “awk command in unix”

What is AWK?

It is a scripting language and thought of as the foremost powerful command within the Linux environment. It doesn’t require any compiling and generates reports by processing some files then analyze those files. It’s wont to manipulate data and is thus suitable for pattern searching and processing.

What Operations Can AWK Perform?

  • Scan files line by line.
  • Splits input record into fields
  • Compare the fields to patterns
  • Perform on action matched pattern

How is AWK Command Useful in Linux and Unix?

  • It helps in manipulating data files as per the user requirements.
  • It generates formatted outputs.

What are the various Program Constructs that AWK Offers?

Various programming concepts offered by AWK are:

  • Output formatting
  • Inbuilt variables
  • Pattern matching
  • String operations
  • Arithmetic operations

How awk Works

There are several different implementations of Awk. We’ll use the GNU implementation of Awk, which is named gawk. On most Linux systems, the awk interpreter is simply a symlink to gawk.

Records and fields

Awk can process textual data files and streams. The input file is split into records and fields, and Awk operates on one record at a time until the top of the input is reached. Records are separated by a personality called the record separator. The default record separator is the newline character, which suggests that every line within the text data may be a record. a replacement record separator is often set using the RS variable.

Records contain fields that are separated by the sector separator. By default, fields are separated by a whitespace, including one or more tab, space, and newline characters.

Also See:  Twitter May Soon Let You Tweet to Only Your "Trusted Friends"

The fields in each record are referenced by the dollar sign ($) followed by field number, beginning with 1. the primary field is represented with $1, the second with $2, and so on. The last field also can be referenced with the special variable $NF. The whole record is often referenced with $0.

Here may be a visual representation showing the way to reference records and fields:

tmpfs      788M  1.8M  786M   1% /run/lock 
/dev/sda1  234G  191G   31G  87% /
|-------|  |--|  |--|   |--| |-| |--------| 
   $1       $2    $3     $4   $5  $6 ($NF) --> fields
|-----------------------------------------| 
                    $0                     --> record

Awk program

To process a text with Awk, you write a program that tells the command what to try. The program consists of a series of rules and user-defined functions. Each rule contains one pattern and action pair. Rules are separated by newline or semi-colons (;). Typically, an awk program seems like this:

pattern { action }
pattern { action }
...

When awk process data, if the pattern matches the record, it performs the required action thereon record. When the rule has no pattern, all records (lines) are matched.

An awk action is enclosed in braces ({}) and consists of statements. Each statement specifies the operation to be performed. An action can have quite one statement separated by newline or semi-colons (;). If the rule has no action, it defaults to printing the entire record.

Awk supports different statements, including expressions, conditionals, input, output statements, and more. the foremost common awk statements are:

  • exit – Stops the execution of the entire program and exits.
  • next – Stops processing the present record and moves to subsequent record within the input file.
  • print – Print records, fields, variables, and custom text.
  • printf – Gives you more control over the output format, almost like C and bash printf.

When writing awk programs, everything after the hash mark (#) and until the top of the road is taken into account to be a comment. Long lines are often broken into multiple lines using the continuation character, backslash (\).

Executing awk programs

An awk program is often run in several ways. If the program is brief and straightforward, it is often passed on to the awk interpreter on the command-line:

awk 'program' input-file...

When running the program on the command-line, it should be enclosed in single quotes (“), therefore the shell doesn’t interpret the program.

If the program is large and sophisticated, it’s best to place it during a file and use the -f choice to pass the file to the awk command:

awk -f program-file input-file...

In the examples below, we’ll use a file named “teams.txt” that appears just like the one below:

Bucks Milwaukee    60 22 0.732 
Raptors Toronto    58 24 0.707 
76ers Philadelphia 51 31 0.622
Celtics Boston     49 33 0.598
Pacers Indiana     48 34 0.585

Awk Patterns

Patterns in awk control whether the associated action should be executed or not.

Awk supports different patterns, including regular expression, relation expression, range, and special expression patterns.

When the rule has no pattern, each input record is matched. Here is an example of a rule containing only an action:

$ awk '{ print $3 }' teams.txt

The program will print the third field of every record:

Output
60
58
51
49
48

Regular expression patterns

A regular expression or regex may be a pattern that matches a group of strings. Awk regular expression patterns are enclosed in slashes (//):

/regex pattern/ { action }

The most basic example may be a literal character or string matching. for instance, to display the primary field of every record that contains “0.5,” you’d run the subsequent command:

$ awk '/0.5/ { print $1 }' teams.txt
Output

Celtics
Pacers

The pattern is often any extended regular expression. Here is an example that prints the primary field if the record starts with two or more digits:

$ awk '/^[0-9][0-9]/ { print $1 }' teams.txt
Output

76ers

Relational expressions patterns

The relational expressions patterns are generally wont to match the content of a selected field or variable.

By default, regular expressions patterns are matched against the records. To match a regex against a field, specify the sector and use the “contain” comparison operator (~) against the pattern.

For example, to print the primary field of every record whose second field contains “ia” you’d type:

$ awk '$2 ~ /ia/ { print $1 }' teams.txt
Output

76ers
Pacers

To match fields that don’t contain a given pattern use the !~ operator:

$ awk '$2 !~ /ia/ { print $1 }' teams.txt
Output

Bucks
Raptors
Celtics

You can compare strings or numbers for relationships like greater than, less than, equal, and so on. the subsequent command prints the primary field of all records whose third field is bigger than 50:

$ awk '$3 > 50 { print $1 }' teams.txt
output

Bucks
Raptors
76ers

Range patterns

Range patterns contain two patterns separated by a comma:

pattern1, pattern2

All records start with a record that matches the primary pattern until a record matches the second pattern.

Here is an example that will print the primary field of all records ranging from the record including “Raptors” until the record including “Celtics”:

$ awk '/Raptors/,/Celtics/ { print $1 }' teams.txt
Output

Raptors
76ers
Celtics

The patterns also can be relation expressions. The command below will print all records ranging from the one whose fourth field is adequate to 32 until the one whose fourth field is adequate to 33:

$  awk '$4 == 31, $4 == 33 { print $0 }' teams.txt
Output

76ers Philadelphia 51 31 0.622
Celtics Boston     49 33 0.598

Range patterns can’t be combined with other pattern expressions.

Special expression patterns

Awk includes the subsequent special pattens:

  • BEGIN – wont to perform actions before records are processed.
  • END – wont to perform actions after records are processed.

The BEGIN pattern is usually wont to set variables and, therefore, the END pattern to process data from the records like calculation.

The following example will print “Start Processing.”, then print the third field of every record, and eventually “End Processing.”:

$ awk 'BEGIN { print "Start Processing." }; { print $3 }; END { print "End Processing." }' teams.txt
Output

Start Processing
60
58
51
49
48
End Processing.

If a program has only a BEGIN pattern, actions are executed, and therefore the input isn’t processed. If a program has only an END pattern, the input is processed before performing the rule actions.

The Gnu version of Awk also includes two more special patterns BEGINFILE and ENDFILE, which allows you to perform actions when processing files.

Also See:  Marvel's Avengers: She-Hulk DLC Appears to Be Confirmed By Actor

Combining patterns

Awk allows you to mix two or more patterns using the logical AND operator (&&) and logical OR operator (||).

Here is an example that uses the && operator to print the primary field of these records whose third field is bigger than 50 and therefore the fourth field is a smaller amount than 30:

$ awk '$3 > 50 && $4 < 30 { print $1 }' teams.txt
Output

Bucks
Raptors

Built-in Variables

Awk features several built-in variables that contain useful information and allows you to regulate how the program is processed. Below are a number of the foremost common built-in Variables:

  • NF – the number of fields within the record.
  • NR – the amount of the current record.
  • FILENAME – The name of the input data that are currently processed.
  • FS – Field separator.
  • RS – Record separator.
  • OFS – Output field separator.
  • ORS – Output record separator.

Here is an example showing the way to print the file name and, therefore, the number of lines (records):

$ awk 'END { print "File", FILENAME, "contains", NR, "lines." }' teams.txt
Output

File teams.txt contains 5 lines.

Variables in AWK are often set at any line within the program. To define a variable for the whole program, put it during a BEGIN pattern.

Changing the Field and Record Separator

The default value of the sector separator is any number of space or tab characters, and it is often changed by setting within the FS variable.

For example, to line the sector separator to . you’d use:

$ awk 'BEGIN { FS = "." } { print $1 }' teams.txt
Output

Bucks Milwaukee    60 22 0
Raptors Toronto    58 24 0
76ers Philadelphia 51 31 0
Celtics Boston     49 33 0
Pacers Indiana     48 34 0

The field separator also can be set to quite one characters:

$ awk 'BEGIN { FS = ".." } { print $1 }' teams.txt

When running awk one-liners on the command-line, you’ll also use the -F choice to change the sector separator:

$ awk -F "." '{ print $1 }' teams.txt

By default, the record separator may be a newline character and may be changed using the RS variable.

Here is an example showing the way to change the record separator to .:

$ awk 'BEGIN { RS = "." } { print $1 }' teams.txt
Output

Bucks Milwaukee    60 22 0
732 
Raptors Toronto    58 24 0
707 
76ers Philadelphia 51 31 0
622
Celtics Boston     49 33 0
598
Pacers Indiana     48 34 0
585

Awk Actions

Awk actions are enclosed in braces ({}) and executed when the pattern matches. An action can have zero or more statements, and multiple statements are executed within the order they seem and must be separated by newline or semi-colons (;).

There are several sorts of action statements that are supported in awk:

  • Expressions, like variable assignment, arithmetic operators, increment, and decrement operators.
  • Control statements wont to control the flow of the program (if, for, while, switch, and more)
  • Output statements, like print and printf.
  • Compound statements to group other statements.
  • Input statements regulate the processing of the input.
  • Deletion statements to get rid of array elements.

The print statement is perhaps the foremost used awk statement, and it prints a formatted output of text, records, fields, and variables.

When printing multiple items, you would like to separate them with commas. Here is an example:

$ awk '{ print $1, $3, $5 }' teams.txt

Single spaces separate the printed items:

Output

Bucks 60 0.732
Raptors 58 0.707
76ers 51 0.622
Celtics 49 0.598
Pacers 48 0.585

If you don’t use commas, there’ll be no space between the items:

$ awk '{ print $1 $3 $5 }' teams.txt

The printed items are concatenated:

Output

Bucks600.732
Raptors580.707
76ers510.622
Celtics490.598
Pacers480.585

When the print is employed without an argument, it defaults to print $0. the current record is printed.

To print a custom text, you want to quote the text with double-quote characters:

$ awk '{ print "The first field:", $1}' teams.txt
Output

The first field: Bucks
The first field: Raptors
The first field: 76ers
The first field: Celtics
The first field: Pacers

You can also print special characters like newline:

$ awk 'BEGIN { print "First line\nSecond line\nThird line" }'
Output

First line
Second line
Third line

The printf statement gives you more control over the output format. Here is an example that inserts line numbers:

$ awk '{ printf "%3d. %s\n", NR, $0 }' teams.txt

printf doesn’t create a newline after each record, so we are using \n:

Output

1. Bucks Milwaukee    60 22 0.732 
 2. Raptors Toronto    58 24 0.707 
 3. 76ers Philadelphia 51 31 0.622
 4. Celtics Boston     49 33 0.598
 5. Pacers Indiana     48 34 0.585

The following command calculates the sum of the values stored within the third field in each line:

$ awk '{ sum += $3 } END { printf "%d\n", sum }' teams.txt
Output

266

Here is another example showing the way to use expressions and control statements to print the squares of numbers from 1 to 5:

$ awk 'BEGIN { i = 1; while (i < 6) { print "Square of", i, "is", i*i; ++i } }'
Output

Square of 1 is 1
Square of 2 is 4
Square of 3 is 9
Square of 4 is 16
Square of 5 is 25

One-line commands like the one above are harder to know and maintain. When writing longer programs, you ought to create a separate program file:

prg.awk
BEGIN { 
  i = 1
  while (i < 6) { 
    print "Square of", i, "is", i*i; 
    ++i 
  } 
}

Run the program by passing the file name to the awk interpreter:

$ awk -f prg.awk

You can also run an awk program as an executable by using the shebang directive and setting the awk interpreter:

prg.awk
#!/usr/bin/awk -f
BEGIN { 
  i = 1
  while (i < 6) { 
    print "Square of", i, "is", i*i; 
    ++i 
  } 
}

Save the file and make it executable :

$ chmod +x prg.awk

You can now run the program by entering:

$ ./prg.awk

Using Shell Variables in Awk Programs

If you’re using the awk command in shell scripts, the probabilities are that you’ll get to pass a shell variable to the awk program. One option is to surround the program with double rather than single quotes and substitute the variable within the program. However, this feature will make your awk program more complex as you’ll get to escape the awk variables.

Also See:  How to Follow to a Twitter Feed with an RSS Reader

The recommended thanks to using shell variables in awk programs is to assign the shell variable to an awk variable. Here is an example:

$ num=51
$ awk -v n="$num" 'BEGIN {print n}'
Output

51

Conclusion 

I hope you found this guide useful. If you’ve got any questions or comments, don’t hesitate to use the shape below. 

User Questions:

  1. What does awk mean in Unix?

You can write awk scripts for complex operations; otherwise, you can use awk from the instruction. The name stands for Aho, Weinberger, and Kernighan (yes, Brian Kernighan), the authors of the language, which was started in 1977; hence it shares an equivalent Unix spirit because of the other classic *nix utilities.

  1. Is AWK still used?

AWK may be a text-processing language with a history spanning quite 40 years. It’s a POSIX standard, has several conforming implementations, and remains surprisingly relevant in 2020 — both for easy text processing tasks and for wrangling “big data.” AWK reads the input a line at a time.

  1. Is AWK written in C?

The AWK interpreter maybe a C program originally written in 1977 and far modified since then. For many people, the interpreter is AWK. the primary step was to translate the interpreter into the C subset of C++ rather than form some minor implementation changes to use C++ better. These are written in C++.

Also See:  Oracle Customer Connect
  1. Nothing prepares the school student for a way powerful awk and sed are within the world. How am I able to learn these tools?

Nothing prepares the college student for how powerful awk and sed are in the real world. How can I learn these tools? from learnprogramming

  1. AWK: better thanks to getting 2nd level data

AWK: better way to get 2nd level data from commandline