The “awk” command is a powerful text processing tool in Unix and Unix-like operating systems, including Linux. It is used for pattern scanning and processing.
“awk” takes input text, processes it line by line, and performs specified actions based on patterns defined in the AWK script.
The name “awk” comes from the initials of its authors: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan and it was developed at Bell Labs in the 1970s.
Here is a basic syntax of the “awk” command:
awk ‘pattern { action }’ file
– pattern: Specifies a pattern or condition that matches lines in the input.
– action: Specifies the action to be performed on lines that match the pattern.
– file: Specifies the input file(s) to be processed. If not provided, “awk” reads from the standard input.
Here’s a simple example. Let’s say you have a file called “data.txt” with the following content:
John 25
Alice 30
Bob 22
You can use “awk” to print the names of people who are 30 years old:
awk '$2 == 30 { print $1 }' data.txt
In this example:
– “$2” refers to the second column (age).
– “==“ is a comparison operator.
– “{ print $1 }” specifies the action to print the first column (name) if the second column is equal to 30.
So, the output will be:
Alice
“awk” is quite versatile and can be used for more complex text processing tasks, such as field manipulation, calculations, and more. It’s widely used in shell scripting and automation tasks. If you’re interested in learning more, you might want to explore the various features and functions provided by “awk”.
Here are 21 examples of using “awk” for real-time use cases:
1. Print specific columns from a file:
awk '{print $1, $3}' filename
The above command prints column 1 and 3 of the file “etc_output” in the terminal.
Actual output of the “etc_output” file:
2. Calculate and print the average of a column:
awk '{sum += $2} END {print "Average:", sum/NR}' filename
3. Filter lines based on a condition:
awk '$5 > 300 {print $1, $5, $9}' filename
The above command checks the column 5 from the output of file “etc_output” which is greater than 300 bytes and prints the column 1, 3 and 9 in the terminal.
4. Print lines matching a specific pattern:
awk '/pattern/ {print}' filename
5. Print the number of lines in a file:
awk 'END {print NR}' filename
6. Print lines with more than a certain number of fields:
awk 'NF > 4 {print}' filename
7. Print only duplicate lines in a file:
awk 'seen[$0]++ == 1 {print}' filename
Before adding duplicates to the yum.log file:
8. Print lines between two patterns:
awk '/start_pattern/, /end_pattern/ {print}' filename
9. Print the length of each line:
awk '{print length, $0}' filename
10. Sum values in a column:
awk '{sum += $2} END {print "Sum:", sum}' filename
11. Format output with printf:
awk '{printf "Name: %-10s Age: %s Gender: %s\n", $1, $2, $3}' data.txt
printf: This is a formatted printing function in AWK that allows you to specify the format of the output.
“Name: %-10s Age: %s Gender: %s\n”: This is the format string. It specifies that you want to print “Name:” followed by the first column ($1) left-aligned with a width of 10 characters (%-10s), then “Age:” followed by the second column ($2), then “Gender:” followed by the third column ($3) and a newline character (\n) to move to the next line.
$1, $2, and $3: These represent the first, second, and third columns in the input file, respectively.
12. Print lines longer than a certain length:
awk 'length($0) > 70 {print}' filename
– “awk”: Invokes the AWK programming language.
– “‘length($0) > 70 {print}'”: This is an AWK program enclosed in single quotes. The program consists of a pattern-action pair. The pattern is “length($0) > 70”, and the action is “{print}”.
– Pattern (“length($0) > 70”): This specifies a condition that must be true for a line to be processed. In this case, it checks if the length of the entire line (“$0”) is greater than 70 characters.
– Action (“{print}”): If the pattern is true, this action is executed. In this case, it simply prints the entire line.
– “filename”: Specifies the input file for “awk” to process. Replace “filename” with the actual name of your file.
So, when you run this “awk” command, it will read each line from the specified file and print only those lines where the length of the line is greater than 70 characters.
13. Print specific lines using a range of line numbers:
awk 'NR >= 10 && NR <= 20 {print}' filename
The above command prints the lines between range 10 and 20, to verify, you can below “-n” along with “cat” command.
cat -n filename
14. Remove duplicate lines from a file:
awk '!seen[$0]++' filename
awk: Invokes the AWK programming language.
!seen[$0]++: This is an AWK pattern-action pair enclosed in single quotes. The pattern is !seen[$0]++, and there is no explicit action specified, so the default action {print} is assumed.
Pattern (!seen[$0]++): This uses an associative array seen to keep track of lines encountered. The array is indexed by the entire line ($0). The ! negates the condition.
Action (implicit {print}): If the pattern is true (i.e., if the line has not been seen before), the default action is executed, which is to print the line.
filename: Specifies the input file for awk to process. Replace “filename” with the actual name of your file.
Here’s how this command works:
“seen” is an associative array in AWK that is used to keep track of unique lines encountered.
$0 represents the entire line.
!seen[$0]++ checks if the current line has not been seen before. If it hasn’t, the condition is true, and the line is printed. The ++ at the end is a post-increment operator, which ensures that the value in the seen array is updated after the check.
So, when you run this awk command, it reads each line from the specified file, and it only prints lines that have not been seen before, effectively removing duplicate lines. This is a common idiom in AWK for filtering unique lines from input.
15. Extract and print specific fields using a delimiter:
awk -F, '{print $1, $3}' filename
awk: Invokes the AWK programming language.
-F,: Specifies the field separator. In this case, the comma (,) is used as the field separator. It tells AWK to treat commas as the delimiter between fields.
‘{print $1, $3}’: This is the AWK program enclosed in single quotes. The program consists of a single action, which is to print the first and third fields of each line.
$1: Represents the first field (column) in the input line.
,: Outputs a space between the first and third fields.
$3: Represents the third field (column) in the input line.
filename: Specifies the input file for awk to process. Replace “filename” with the actual name of your CSV file.
Here’s an example to illustrate how this command works. Suppose you have a CSV file named data.csv with the following content:
Name,Age,Gender
John,25,Male
Alice,30,Female
Bob,22,Male
Running the awk command will output:
16. Print the last field of each line:
awk '{print $NF}' filename
17. Print lines with a specific field length:
awk 'length($2) == 5 {print}' filename
The above command will check only second field for 5 length.
or
awk 'length($0) == 61 {print}' filename
The above command will check the entire line for 61 field length.
18. Combine multiple files horizontally:
This awk command is used to process two files (file1 and file2) and combine information from both files based on a common field. Let’s break down the command:
awk 'FNR==NR {a[$1]=$2; next} {print $0, a[$1]}' file1 file2
awk: Invokes the AWK programming language.
FNR==NR {a[$1]=$2; next}: This is a pattern-action pair. The pattern FNR==NR is true only while reading the first file (file1). The action {a[$1]=$2; next} is executed for lines in the first file.
FNR: Represents the record number in the current file.
NR: Represents the overall record number (across all files).
a[$1]=$2: Creates an associative array a where the key is the value of the first field ($1) in the first file, and the value is the second field ($2) in the first file.
next: Skips the rest of the AWK commands and moves to the next line.
{print $0, a[$1]}: This is another action that is applied while reading the second file (file2). It prints the entire line ($0) from the second file followed by the value associated with the first field ($1) from the first file.
file1 file2: Specifies the input files for awk to process. Replace “file1” and “file2” with the actual names of your files.
Here’s an example to illustrate how this command works. Suppose you have two files:
file1:
John 25
Alice 30
Bob 22
file2:
John Smith
Alice Johnson
Bob Brown
Running the awk command will output:
This example demonstrates how the command combines information from both files based on the common field (the first field). The values from file1 are associated with the corresponding keys in file2.
19. Print lines with a specific field matching a pattern:
This “awk” command is used to print lines from a file where the content of the second field (“$2”) matches a specified pattern. Let’s break down the command:
awk '$2 ~ /pattern/ {print}' filename
– “awk”: Invokes the AWK programming language.
– “$2 ~ /pattern/ {print}”: This is a pattern-action pair. The pattern is “$2 ~ /pattern/”, and the action is “{print}”.
– “$2”: Represents the second field (column) in the input line.
– “~”: Is a pattern matching operator in AWK. It checks if the content of the second field matches the specified pattern.
– “/pattern/”: Is the pattern to match. Replace “pattern” with the actual pattern you are looking for.
– “{print}”: If the pattern is true (i.e., if the content of the second field matches the specified pattern), the action is executed, which is to print the entire line.
– “filename”: Specifies the input file for “awk” to process. Replace “filename” with the actual name of your file.
Here’s an example to illustrate how this command works. Suppose you have a file named “data.txt” with the following content:
John apple
Alice banana
Bob orange
Charlie apple
Running the “awk” command with the pattern “/apple/” will output:
John apple
Charlie apple
This example demonstrates how the command prints lines where the content of the second field (“$2”) contains the specified pattern (“apple”).
Lines with “banana” and “orange” are not printed because they do not match the pattern.
20. Print lines with specific field values:
This awk command is used to print lines from a file where the content of the second field ($2) is equal to a specified value. Let’s break down the command:
awk '$2 == "value" {print}' filename
awk: Invokes the AWK programming language.
$2 == “value” {print}: This is a pattern-action pair. The pattern is $2 == “value”, and the action is {print}.
$2: Represents the second field (column) in the input line.
==: Is a comparison operator in AWK. It checks if the content of the second field is equal to the specified value.
“value”: Is the value to compare against. Replace “value” with the actual value you are looking for.
{print}: If the pattern is true (i.e., if the content of the second field is equal to the specified value), the action is executed, which is to print the entire line.
filename: Specifies the input file for awk to process. Replace “filename” with the actual name of your file.
Here’s an example to illustrate how this command works. Suppose you have a file named data.txt with the following content:
John apple
Alice banana
Bob orange
Charlie apple
Running the awk command with the value “apple” will output:
John apple
Charlie apple
This example demonstrates how the command prints lines where the content of the second field ($2) is equal to the specified value (“apple”).
Lines with “banana” and “orange” are not printed because they do not match the specified value.
21. Print unique values in a specific column:
awk '{print $2}' filename | sort | uniq
Remember to replace “filename” with the actual name of your file. These examples cover a range of “awk” functionalities, and you can combine and modify them to suit your specific needs.
How do you feel about this post? Drop your comments below..