Working with RegEx

Last Updated : 29 Jan, 2026

Regular Expressions (RegEx) are the "search and replace" on steroids. They allow you to define complex search patterns, such as "find every line that starts with a timestamp" or "extract all email addresses from this file."

In Linux, RegEx is the backbone of powerful command-line utilities like grep, sed, and awk. This guide will take you from basic matching to advanced pattern extraction.

Basics of Regular Expressions:

A regular expression is a sequence of characters that forms a search pattern. It can include literal characters, metacharacters, and quantifiers to define a specific pattern.

  • Literal Characters: Matches the exact character (e.g., apple matches "apple").
  • Metacharacters: Special characters that define rules (e.g., ^, $, .).

2. Metacharacters:

Metacharacters have special meanings in regular expressions:

SymbolNameFunctionExampleMatches
.DotMatches any single character except newlineb.tbat, bet, bit, b@t
^CaretAnchors to the start of the line.^ErrorLines starting with "Error"
$DollarAnchors to the end of the line.done$Lines ending with "done"
[]Character ClassMatches one character from a specified set or range.[Rr]eadRead, read
[^...]NegationMatches one character NOT in the specified set.[^0-9]Any non-digit character

3. Quantifiers:

Quantifiers specify the number of occurrences of a character or group:

  • *: Matches 0 or more occurrences.
  • +: Matches 1 or more occurrences.
  • ?: Matches 0 or 1 occurrence.
  • {n}: Matches exactly n occurrences.
  • {n,}: Matches n or more occurrences.
  • {n,m}: Matches between n and m occurrences.

Using RegEx with Linux Commands:

1. Using grep:

The grep command is a powerful tool for searching text using regular expressions.

  • Basic Usage:
grep "pattern" filename
  • Example:

Assume a file named regextut.txt contains the following text:

apple is made by apple is a company by apple

iphone is made by apple

faaskndjdfnksdjappleaskldjfsl

grep "apple" regextut.txt
grep
output

This command searches for lines containing the word "apple" in the regextut.txt file.

Basic vs. Extended RegEx

This is where most beginners get stuck. Linux commands like grep use Basic RegEx (BRE) by default, which treats symbols like +, ?, and | as literal characters.

To use these powerful quantifiers, you must use Extended RegEx (ERE) by adding the -E flag (or using egrep).

  • Wrong: grep "error+" log.txt (Looks for the literal string "error+")
  • Right: grep -E "error+" log.txt (Looks for "error", "errorr", "errorrr")

2. Using sed:

sed is used for "Stream Editing." It can replace text on the fly using RegEx.

  • Substitution:
sed 's/pattern/replacement/g' filename
  • Example:
sed 's/apple/banana/g' regextut.txt
sed
Output

This command substitutes all occurrences of "apple" with "banana" in regextut.txt.

Advanced Examples:

1. Matching IP Addresses:

grep -P '(\d{1,3}\.){3}\d{1,3}' input.txt

This command uses Perl-compatible regular expressions (-P flag) to match IPv4 addresses in a file.

2. Extracting Email Addresses:

grep -oP '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b' emails.txt

This command extracts email addresses from a file using a regex pattern.

Summary Cheatsheet

TaskPatternExplanation
Match Email[\w\.-]+@[\w\.-]+Simple email match.
Match Date\d{4}-\d{2}-\d{2}Format YYYY-MM-DD.
Match IP\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}Simple IP match.
Blank Lines^$Start immediately followed by end.
Comments^#Lines starting with a hash.
Comment