Regular Expressions (RegEx) are the "search and replace" on steroids. They allow you to define complex search patterns, such as "find every line that starts with a timestamp" or "extract all email addresses from this file."
In Linux, RegEx is the backbone of powerful command-line utilities like grep, sed, and awk. This guide will take you from basic matching to advanced pattern extraction.
Basics of Regular Expressions:
A regular expression is a sequence of characters that forms a search pattern. It can include literal characters, metacharacters, and quantifiers to define a specific pattern.
- Literal Characters: Matches the exact character (e.g., apple matches "apple").
- Metacharacters: Special characters that define rules (e.g., ^, $, .).
2. Metacharacters:
Metacharacters have special meanings in regular expressions:
| Symbol | Name | Function | Example | Matches |
|---|---|---|---|---|
| . | Dot | Matches any single character except newline | b.t | bat, bet, bit, b@t |
| ^ | Caret | Anchors to the start of the line. | ^Error | Lines starting with "Error" |
| $ | Dollar | Anchors to the end of the line. | done$ | Lines ending with "done" |
| [] | Character Class | Matches one character from a specified set or range. | [Rr]ead | Read, read |
| [^...] | Negation | Matches one character NOT in the specified set. | [^0-9] | Any non-digit character |
3. Quantifiers:
Quantifiers specify the number of occurrences of a character or group:
- *: Matches 0 or more occurrences.
- +: Matches 1 or more occurrences.
- ?: Matches 0 or 1 occurrence.
- {n}: Matches exactly n occurrences.
- {n,}: Matches n or more occurrences.
- {n,m}: Matches between n and m occurrences.
Using RegEx with Linux Commands:
1. Using grep:
The grep command is a powerful tool for searching text using regular expressions.
- Basic Usage:
grep "pattern" filename- Example:
Assume a file named regextut.txt contains the following text:
apple is made by apple is a company by apple
iphone is made by apple
faaskndjdfnksdjappleaskldjfsl
grep "apple" regextut.txtThis command searches for lines containing the word "apple" in the regextut.txt file.
Basic vs. Extended RegEx
This is where most beginners get stuck. Linux commands like grep use Basic RegEx (BRE) by default, which treats symbols like +, ?, and | as literal characters.
To use these powerful quantifiers, you must use Extended RegEx (ERE) by adding the -E flag (or using egrep).
- Wrong: grep "error+" log.txt (Looks for the literal string "error+")
- Right: grep -E "error+" log.txt (Looks for "error", "errorr", "errorrr")
2. Using sed:
sed is used for "Stream Editing." It can replace text on the fly using RegEx.
- Substitution:
sed 's/pattern/replacement/g' filename- Example:
sed 's/apple/banana/g' regextut.txtThis command substitutes all occurrences of "apple" with "banana" in regextut.txt.
Advanced Examples:
1. Matching IP Addresses:
grep -P '(\d{1,3}\.){3}\d{1,3}' input.txtThis command uses Perl-compatible regular expressions (-P flag) to match IPv4 addresses in a file.
2. Extracting Email Addresses:
grep -oP '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b' emails.txtThis command extracts email addresses from a file using a regex pattern.
Summary Cheatsheet
| Task | Pattern | Explanation |
| Match Email | [\w\.-]+@[\w\.-]+ | Simple email match. |
| Match Date | \d{4}-\d{2}-\d{2} | Format YYYY-MM-DD. |
| Match IP | \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} | Simple IP match. |
| Blank Lines | ^$ | Start immediately followed by end. |
| Comments | ^# | Lines starting with a hash. |