Here we can see, “How to Use the awk Command on Linux”
The awk command is fundamentally a scripting language and a strong text manipulation tool in Linux. It’s named after its founders Alfred Aho, Peter Weinberger, and Brian Kernighan. Awk is popular due to its ability to process text (strings) as easily as numbers.
It scans a sequence of input lines or records, checking outlines that match the pattern one by one. When a match is found, an action is often performed. It’s a pattern-action language.
Input to awk can come from files, redirection and pipes or directly from standard input.
Terminology
Let’s get on to some basic terms before we dive into the tutorial. This may make it easier for you to know the concept better.
1.Records
awk perceives each line as a record.
- RS is employed to say record separators. By default, RS is about to newline.
- NR is that the variable that tracks the record number. Its value is adequate to the record being processed. NR is often assumed to be the road number within the default scenario.
2.Fields
Each record is split into fields. Meaning each line is broken into fields.
- FS is that the field separator. By default, FS is about to whitespace. Meaning each word may be a field.
- NF is that the Number of Fields during a particular record.
Fields are numbered as:
- $0 for the entire line.
- $1 for the primary field.
- $2 for the second field.
- $n for the nth field.
- $NF for the last field.
- $NF-1 for the second last field.
Standard format of awk
The standard format of the awk command is:
$ awk ' BIGIN{/instructions/} /pattern/ {ACTIONS} END{/instructions}' file_name
- The pattern-action pair is to be enclosed within one quote(‘)
- BEGIN, and END are optional and employed to mention actions to be performed before and after processing the input.
- The pattern represents the condition that, if fulfilled, results in the execution of the action
- The action specifies the specific set of commands to be performed when there’s a successful match.
- file_name is to be specified if the input is coming from a file.
Basic usage of the awk command
awk is often wont to print a message to the terminal supported some pattern within the text. If you run the awk command with no pattern and just one print command, awk prints the message whenever you hit enter. This happens because the awk command is expecting input from the instruction interface.
$ awk '{print "This is how awk command is used for printing"}'
Processing input from the command line using awk
We saw within the previous example that if no input source is mentioned, then awk takes input from the instruction.
Input under awk is seen as a set of records, and every record is further a set of fields. We will use this to process input in real-time.
$ awk '$3=="linux" {print "That is amazing!", $1}'
This code looks for the pattern where the third word within the line is ‘Linux“. When a match is found, it prints the message. Here we’ve referenced the primary field from an equivalent line. Before moving forward, let’s create a document to be used as input.
This can be done using the cat command in Linux.
This can be done using cat command in linux.
The text of the file is:
First 200
Second 300
Third 150
Fourth 300
Fifth 250
Sixth 500
Seventh 100
Eight 50
Ninth 70
Tenth 270
These could be the dues in rupees for different customers named First, Second…so on.
Printing from a file using fields
Input from a file can be printed using awk. We can refer to different fields to print the output in a fancy manner.
$ awk '{print $1, "owes", $2}' rec.txt
$1 and $2 are used for about fields one and two, respectively. These in our input data are the primary and second words in each line. We haven’t mentioned any pattern during this command; therefore awk command runs the action on every record. The default pattern for awk is “which matches every line.
Playing with awk separators
There are three sorts of separators in awk.
- OFS: output field separator
- FS: field separator
- RS: record separator
1. Output field separator (OFS)
You can notice that by default print command separates the output fields by whitespace. This will be changed by changing OFS.
$ awk 'OFS=" owes " {print $1,$2}' rec.txt
The same output is achieved because of the previous case. The default output field separator has been changed from whitespace to” owes “. This, however, isn’t the simplest, thanks to changing the OFS. All the separators should be changed within the BEGIN section of the awk command.
2. Field Separator (FS)
Field separators are often changed by changing the worth of FS. By default, FS is about to whitespace. We created another file with the following data. Here the name and therefore the amount are separated by ‘-‘
First-200
Second-300
Third-150
Fourth-300
Fifth-250
Sixth-500
Seventh-100
Eight-50
Ninth-70
Tenth-270
$ awk 'FS="-" {print $1}' rec-sep.txt
You can notice that the primary line of the output is wrong. It seems that for the primary record, awk wasn’t ready to separate the fields. This is often because we’ve mentioned the statement that changes the sector separator within the action section. The primary time action section runs after the primary record has been processed. During this case, First-200 is read and processed with field separator as whitespace.
Correct way:
$ awk 'BEGIN {FS="-"} {print $1}' rec_1.txt
Now we get the right output. The primary record has been separated successfully. Any statement placed within the BEGIN section runs before processing the input. BEGIN section is most frequently wont to print a message before the processing of input.
3. Record separator (RS)
The third sort of separator is the record separator. By default record separator is about to newline. Record separators are often changed by changing the worth of RS. Changing RS is beneficial just in case the input may be a CSV (comma-separated value) file.
For example, if the input is:
First-200,Second-300,Third-150,Fourth-300,Fifth-250,Sixth-500,Seventh-100,Eight-50,Ninth-70,Tenth-270
This is an equivalent input as above but during a comma-separated format.
We can process such a file by changing the RS field.
$ awk 'BEGIN {FS="-"; RS=","; OFS=" owes Rs. "} {print $1,$2}' rec_2.txt
Boolean operations in awk
Boolean operations are often used as patterns. Different field values are often wont to perform comparisons, and awk works like an if-then command. In our data, we will find customers with quite Rs. 200 due.
$ awk '$2>200 {print $1, "owes Rs.",$2}' rec.txt
This gives us the list by comparing the second field of every record with the 200 and printing if the condition is true.
Matching string literals using the awk command
Since awk works with fields, we will use this to our benefit. Running ls -l command gives the list of all the files within the current directory with additional information.
The awk command is often used alongside ls -l to seek out out which files were created within May. $6 is that the field for displaying the month during which the file was created. We will use this and match the sector with the string ‘May’.
$ ls -l | awk '$6=="May" {print $9}'
User-defined variables in awk
To perform additional operations, variables are often defined in awk. for instance, to calculate the sum within the list of individuals with dues greater than 200, we will define a sum variable to calculate the sum.
$ awk 'BEGIN {sum=0} $2>200 {sum=sum+$2; print $1} END{print sum}' rec.txt
The sum variable is initialized within the BEGIN section, updated within the action section, and printed within the END section. The action section would be used as long as the condition mentioned within the pattern section is true. Since the pattern is checked for every line, the structure works like a loop, with an update being performed whenever the condition is met.
Counting with the awk command
The awk command can also count the number of lines, the number of words, and even the number of characters. Let’s start with counting the number of lines with the awk command.
Count the number of lines
The number of lines is often printed by printing out the NR variable within the END section. NR is employed to store the present record number. Since the top section is accessed, the records are processed in any case; NR within the END section would contain the entire number of records.
$ awk 'END { print NR }' rec.txt
Count number of words
To get the number of words, NF is often used. NF is that the number of fields in each record. If NF is totalled over all the records, the number of words is often achieved. Within the command, c is employed to count the number of words. for every line, the entire number of fields therein line is added to c. within the END section, printing c would give the entire number of words.
$ awk 'BEGIN {c=0} {c=c+NF} END{print c}' rec.txt
Count number of characters
Number of characters for every line is often obtained by using the inbuilt length function of awk. $0 is employed for getting the whole record. Length ($0) would give the number of characters therein the record.
awk '{ print "number of characters in line", NR,"=" length($0) }' rec.txt
Conclusion
awk commands are often used for performing very powerful text manipulations. The convenience of directly accessing fields gives awk a serious advantage over sed. As mentioned, awk isn’t just a command-line tool but also a strong scripting language.
User Questions:
- What is the use of awk command in Linux?
Awk may be a scripting language used for manipulating data and generating reports. The awk command programing language requires no compiling and allows the user to use variables, numeric functions, string functions, and logical operators. … Awk is usually used for pattern scanning and processing.
- What is the awk option?
awk is often utilized in instruction to process and format the info from one or more input files or output of another program.
- What is WC in Linux command?
Wc (short for word count) may be a command in Unix, Plan 9, Inferno, and Unix-like operating systems. The program reads either standard input or an inventory of computer files and generates one or more of the following statistics: newline count, word count, and byte count.
4.Are sed/Awk still viable?
- I even wrote native tui for bib.awk