How to use awk command effectively in linux

The “awk” command is a powerful text processing tool in Unix and Unix-like operating systems, including Linux. It is used for pattern scanning and processing.

awk” takes input text, processes it line by line, and performs specified actions based on patterns defined in the AWK script.

The name “awk” comes from the initials of its authors: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan and it was developed at Bell Labs in the 1970s.

Here is a basic syntax of the “awk” command:

awk ‘pattern { action }’ file

– pattern: Specifies a pattern or condition that matches lines in the input.

– action: Specifies the action to be performed on lines that match the pattern.

– file: Specifies the input file(s) to be processed. If not provided, “awk” reads from the standard input.

Here’s a simple example. Let’s say you have a file called “data.txt” with the following content:

John 25

Alice 30

Bob 22

You can use “awk” to print the names of people who are 30 years old:

awk '$2 == 30 { print $1 }' data.txt

In this example:

– “$2” refers to the second column (age).

– “==“ is a comparison operator.

– “{ print $1 }” specifies the action to print the first column (name) if the second column is equal to 30.

So, the output will be:

Alice

“awk” is quite versatile and can be used for more complex text processing tasks, such as field manipulation, calculations, and more. It’s widely used in shell scripting and automation tasks. If you’re interested in learning more, you might want to explore the various features and functions provided by “awk”.

Here are 21 examples of using “awk” for real-time use cases:
1. Print specific columns from a file:
awk '{print $1, $3}' filename

The above command prints column 1 and 3 of the file “etc_output” in the terminal.

Actual output of the “etc_output” file:

2. Calculate and print the average of a column:
awk '{sum += $2} END {print "Average:", sum/NR}' filename

3. Filter lines based on a condition:
awk '$5 > 300 {print $1, $5, $9}' filename

The above command checks the column 5 from the output of file “etc_output” which is greater than 300 bytes and prints the column 1, 3 and 9 in the terminal.

4. Print lines matching a specific pattern:
awk '/pattern/ {print}' filename

5. Print the number of lines in a file:
awk 'END {print NR}' filename

6. Print lines with more than a certain number of fields:
awk 'NF > 4 {print}' filename

7. Print only duplicate lines in a file:
awk 'seen[$0]++ == 1 {print}' filename

Before adding duplicates to the yum.log file:

8. Print lines between two patterns:
awk '/start_pattern/, /end_pattern/ {print}' filename

9. Print the length of each line:
awk '{print length, $0}' filename

10. Sum values in a column:
awk '{sum += $2} END {print "Sum:", sum}' filename

11. Format output with printf:
awk '{printf "Name: %-10s Age: %s Gender: %s\n", $1, $2, $3}' data.txt

printf: This is a formatted printing function in AWK that allows you to specify the format of the output.

“Name: %-10s Age: %s Gender: %s\n”: This is the format string. It specifies that you want to print “Name:” followed by the first column ($1) left-aligned with a width of 10 characters (%-10s), then “Age:” followed by the second column ($2), then “Gender:” followed by the third column ($3) and a newline character (\n) to move to the next line.

$1, $2, and $3: These represent the first, second, and third columns in the input file, respectively.

12. Print lines longer than a certain length:
awk 'length($0) > 70 {print}' filename

– “awk”: Invokes the AWK programming language.

– “‘length($0) > 70 {print}'”: This is an AWK program enclosed in single quotes. The program consists of a pattern-action pair. The pattern is “length($0) > 70”, and the action is “{print}”.

– Pattern (“length($0) > 70”): This specifies a condition that must be true for a line to be processed. In this case, it checks if the length of the entire line (“$0”) is greater than 70 characters.

– Action (“{print}”): If the pattern is true, this action is executed. In this case, it simply prints the entire line.

– “filename”: Specifies the input file for “awk” to process. Replace “filename” with the actual name of your file.

So, when you run this “awk” command, it will read each line from the specified file and print only those lines where the length of the line is greater than 70 characters.

13. Print specific lines using a range of line numbers:
awk 'NR >= 10 && NR <= 20 {print}' filename

The above command prints the lines between range 10 and 20, to verify, you can below “-n” along with “cat” command.

cat -n filename

14. Remove duplicate lines from a file:
awk '!seen[$0]++' filename

awk: Invokes the AWK programming language.

!seen[$0]++: This is an AWK pattern-action pair enclosed in single quotes. The pattern is !seen[$0]++, and there is no explicit action specified, so the default action {print} is assumed.

Pattern (!seen[$0]++): This uses an associative array seen to keep track of lines encountered. The array is indexed by the entire line ($0). The ! negates the condition.

Action (implicit {print}): If the pattern is true (i.e., if the line has not been seen before), the default action is executed, which is to print the line.

filename: Specifies the input file for awk to process. Replace “filename” with the actual name of your file.

Here’s how this command works:

“seen” is an associative array in AWK that is used to keep track of unique lines encountered.

$0 represents the entire line.

!seen[$0]++ checks if the current line has not been seen before. If it hasn’t, the condition is true, and the line is printed. The ++ at the end is a post-increment operator, which ensures that the value in the seen array is updated after the check.

So, when you run this awk command, it reads each line from the specified file, and it only prints lines that have not been seen before, effectively removing duplicate lines. This is a common idiom in AWK for filtering unique lines from input.

15. Extract and print specific fields using a delimiter:
awk -F, '{print $1, $3}' filename

awk: Invokes the AWK programming language.

-F,: Specifies the field separator. In this case, the comma (,) is used as the field separator. It tells AWK to treat commas as the delimiter between fields.

‘{print $1, $3}’: This is the AWK program enclosed in single quotes. The program consists of a single action, which is to print the first and third fields of each line.

$1: Represents the first field (column) in the input line.

,: Outputs a space between the first and third fields.

$3: Represents the third field (column) in the input line.

filename: Specifies the input file for awk to process. Replace “filename” with the actual name of your CSV file.

Here’s an example to illustrate how this command works. Suppose you have a CSV file named data.csv with the following content:

Name,Age,Gender

John,25,Male

Alice,30,Female

Bob,22,Male

Running the awk command will output:

16. Print the last field of each line:
awk '{print $NF}' filename

17. Print lines with a specific field length:
awk 'length($2) == 5 {print}' filename

The above command will check only second field for 5 length.

or

awk 'length($0) == 61 {print}' filename

The above command will check the entire line for 61 field length.

18. Combine multiple files horizontally:

This awk command is used to process two files (file1 and file2) and combine information from both files based on a common field. Let’s break down the command:

awk 'FNR==NR {a[$1]=$2; next} {print $0, a[$1]}' file1 file2

awk: Invokes the AWK programming language.

FNR==NR {a[$1]=$2; next}: This is a pattern-action pair. The pattern FNR==NR is true only while reading the first file (file1). The action {a[$1]=$2; next} is executed for lines in the first file.

FNR: Represents the record number in the current file.

NR: Represents the overall record number (across all files).

a[$1]=$2: Creates an associative array a where the key is the value of the first field ($1) in the first file, and the value is the second field ($2) in the first file.

next: Skips the rest of the AWK commands and moves to the next line.

{print $0, a[$1]}: This is another action that is applied while reading the second file (file2). It prints the entire line ($0) from the second file followed by the value associated with the first field ($1) from the first file.

file1 file2: Specifies the input files for awk to process. Replace “file1” and “file2” with the actual names of your files.

Here’s an example to illustrate how this command works. Suppose you have two files:

file1:

John 25

Alice 30

Bob 22

file2:

John Smith

Alice Johnson

Bob Brown

Running the awk command will output:

This example demonstrates how the command combines information from both files based on the common field (the first field). The values from file1 are associated with the corresponding keys in file2.

19. Print lines with a specific field matching a pattern:

This “awk” command is used to print lines from a file where the content of the second field (“$2”) matches a specified pattern. Let’s break down the command:

awk '$2 ~ /pattern/ {print}' filename

– “awk”: Invokes the AWK programming language.

– “$2 ~ /pattern/ {print}”: This is a pattern-action pair. The pattern is “$2 ~ /pattern/”, and the action is “{print}”.

– “$2”: Represents the second field (column) in the input line.

– “~”: Is a pattern matching operator in AWK. It checks if the content of the second field matches the specified pattern.

– “/pattern/”: Is the pattern to match. Replace “pattern” with the actual pattern you are looking for.

– “{print}”: If the pattern is true (i.e., if the content of the second field matches the specified pattern), the action is executed, which is to print the entire line.

– “filename”: Specifies the input file for “awk” to process. Replace “filename” with the actual name of your file.

Here’s an example to illustrate how this command works. Suppose you have a file named “data.txt” with the following content:

John apple

Alice banana

Bob orange

Charlie apple

Running the “awk” command with the pattern “/apple/” will output:

John apple

Charlie apple

This example demonstrates how the command prints lines where the content of the second field (“$2”) contains the specified pattern (“apple”).

Lines with “banana” and “orange” are not printed because they do not match the pattern.

20. Print lines with specific field values:

This awk command is used to print lines from a file where the content of the second field ($2) is equal to a specified value. Let’s break down the command:

awk '$2 == "value" {print}' filename

awk: Invokes the AWK programming language.

$2 == “value” {print}: This is a pattern-action pair. The pattern is $2 == “value”, and the action is {print}.

$2: Represents the second field (column) in the input line.

==: Is a comparison operator in AWK. It checks if the content of the second field is equal to the specified value.

“value”: Is the value to compare against. Replace “value” with the actual value you are looking for.

{print}: If the pattern is true (i.e., if the content of the second field is equal to the specified value), the action is executed, which is to print the entire line.

filename: Specifies the input file for awk to process. Replace “filename” with the actual name of your file.

Here’s an example to illustrate how this command works. Suppose you have a file named data.txt with the following content:

John apple

Alice banana

Bob orange

Charlie apple

Running the awk command with the value “apple” will output:

John apple

Charlie apple

This example demonstrates how the command prints lines where the content of the second field ($2) is equal to the specified value (“apple”).

Lines with “banana” and “orange” are not printed because they do not match the specified value.

21. Print unique values in a specific column:
awk '{print $2}' filename | sort | uniq

Remember to replace “filename” with the actual name of your file. These examples cover a range of “awk” functionalities, and you can combine and modify them to suit your specific needs.

How do you feel about this post? Drop your comments below..