Regular Expression Matching

Updated on 19 June, 2025
Regular Expression Matching header image

Problem Statement

The objective is to determine if a given input string s can be matched against a pattern p. The pattern incorporates two special characters: '.', which can represent any single character, and '*', which can represent zero or more of the character that directly precedes it. The challenge lies in ensuring that the pattern matches the entire string s, not just part of it. This task involves recognizing the sequence and potentially repetitive patterns within the string as dictated by p to ensure a full match.

Examples

Example 1

Input:

s = "aa", p = "a"

Output:

false

Explanation:

"a" does not match the entire string "aa".

Example 2

Input:

s = "aa", p = "a*"

Output:

true

Explanation:

'*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

Example 3

Input:

s = "ab", p = ".*"

Output:

true

Explanation:

".*" means "zero or more (*) of any character (.)".

Constraints

  • 1 <= s.length <= 20
  • 1 <= p.length <= 20
  • s contains only lowercase English letters.
  • p contains only lowercase English letters, '.', and '*'.
  • It is guaranteed for each appearance of the character '*', there will be a previous valid character to match.

Approach and Intuition

The problem can be approached by understanding the roles of the special characters and the string manipulation they imply:

  1. Understanding the Special Characters:

    • '.' matches any single character – This broadens the scope of matching any letter at a specific position where '.' appears in the pattern.
    • '*' matches zero or more of the immediately preceding element – This introduces repetition and requires careful attention as it can match an empty sequence or multiple sequences of the previous character.
  2. Steps to Approach the Problem:

    • Start Matching from the Beginning: Traverse both the string s and the pattern p from the beginning, checking for direct matches or special character conditions.
    • Handle '*' Character: When encountering a '*', decide whether to skip the preceding element (consider it zero times) or to include multiple instances of it. Adjust the traversal of s and p accordingly.
    • Handle '.' Character: When encountering a ., simply move to the next character in both the string s and the pattern p.
    • End Condition: Ensure that by the end of the traversal, all characters of s are matched and p is completely considered. If p ends in a '*', it can also end the match as it is capable of matching an empty sequence.

The shifting nuances of '*' necessitating different amounts of backtrack and lookahead in the string make this an intriguing problem, often solved using dynamic programming or recursive algorithms to consider all potential matches meticulously. The constraints provided ensure that the solutions are computationally feasible even with the more complex methods such as recursion with memoization.

Solutions

  • Java
  • Python
java
class Solution {
    public boolean stringMatches(String str, String pat) {
        boolean[][] result = new boolean[str.length() + 1][pat.length() + 1];
        result[str.length()][pat.length()] = true;

        for (int i = str.length(); i >= 0; i--) {
            for (int j = pat.length() - 1; j >= 0; j--) {
                boolean initMatch =
                    (i < str.length() &&
                        (pat.charAt(j) == str.charAt(i) ||
                            pat.charAt(j) == '.'));
                if (j + 1 < pat.length() && pat.charAt(j + 1) == '*') {
                    result[i][j] = result[i][j + 2] || (initMatch && result[i + 1][j]);
                } else {
                    result[i][j] = initMatch && result[i + 1][j + 1];
                }
            }
        }
        return result[0][0];
    }
}

The provided Java solution tackles the problem of determining whether a given string str matches a specified pattern pat, which may include regular expression characters like '.' and '*'. This solution utilizes dynamic programming to efficiently solve the problem:

  • A 2D boolean array result is initiated, with dimensions (str.length() + 1) x (pat.length() + 1). This storage is used to store results of subproblems, optimizing the solution for larger inputs by avoiding redundant calculations.

  • The algorithm initializes result[str.length()][pat.length()] to true, representing the base case where the end of both the string and the pattern are reached simultaneously, implying a successful match.

  • The solution iterates over the string and pattern from the end to the beginning. For each pair (i, j), the algorithm first checks if the current positions i on str and j on pat can match. This is stored in initMatch, considering:

    • The characters actually match (str.charAt(i) == pat.charAt(j)).
    • The pattern character is a wildcard dot (pat.charAt(j) == '.'), which matches any character.
  • For characters followed by '*', it evaluates two conditions:

    • Skip the '*' and the character before it (result[i][j + 2]).
    • Use the '*' to match multiple characters, requiring that the current characters match and the result at (i + 1, j) is true.
  • For characters not followed by '*', it sets result[i][j] to true if initMatch and the results from the next characters of both string and pattern are also true.

  • Finally, result[0][0] provides the overall match result, returning true if the entire string matches the pattern from beginning to end.

This dynamic programming approach drastically reduces the time complexity that a naive recursive solution would involve, especially with patterns containing multiple '*' characters.

python
class Matcher(object):
    def pattern_match(self, s: str, p: str) -> bool:
        match_table = [[False] * (len(p) + 1) for _ in range(len(s) + 1)]
        match_table[-1][-1] = True
        for x in range(len(s), -1, -1):
            for y in range(len(p) - 1, -1, -1):
                current_match = x < len(s) and p[y] in {s[x], "."}
                if y + 1 < len(p) and p[y + 1] == "*":
                    match_table[x][y] = match_table[x][y + 2] or current_match and match_table[x + 1][y]
                else:
                    match_table[x][y] = current_match and match_table[x + 1][y + 1]
        return match_table[0][0]

The solution provided solves the problem of regular expression matching, which checks if a string s matches a pattern p where the pattern can include the characters '.' and ''. The . matches any single character, and `` matches zero or more of the preceding element.

  • This is implemented in Python 3.
  • The code defines a class Matcher containing a method pattern_match that accepts the string s and the pattern p as parameters.
  • It uses dynamic programming to build a 2D list match_table where match_table[x][y] will be True if the substring of s starting at index x matches the subpattern of p starting at index y.
  • Initializes match_table[-1][-1] to True, which represents that an empty string matches an empty pattern.
  • Iterates through each character of the string s and pattern p in reverse to fill the match_table based on the current character match (current_match) and the structure of the pattern (handling cases for characters followed by '*' specifically).
  • current_match is derived from comparing each character of p (pattern) with s (string), considering . as a wildcard character.
  • For patterns with *, the logic checks two conditions:
    • Skipping over the '*' and the character before it (match_table[x][y + 2]).
    • Matching the current pattern character and a repeating sequence (current_match and match_table[x + 1][y]).
  • For other characters, it simply checks the match for the current character and moves to the next character in both s and p.
  • The method returns the value at match_table[0][0], representing whether the entire string s matches the pattern p.

This method efficiently solves the pattern matching problem by systematically testing all possibilities using dynamic programming, ensuring accurate checks even for complex patterns including repetitive sequences.

Comments

No comments yet.