Regular-Expression-Matching


Question

Implement regular Expression matching with support for ‘.’ and ‘*’ .

‘.’ Matches any single character .
‘*’ Matches zero or more of the preceding element .

The matching should cover the entire input string (not partial) .

The function prototype should be :
bool isMatch(const char s, const char p)

Some examples :
isMatch(“aa”, “a”) ? false
isMatch(“aa”, “aa”) ? true
isMatch(“aa”, “a*“) ? true
isMatch(“aa”, “.*“) ? true
isMatch(“ab”, “.*“) ? true
isMatch(“aab”, “c*a*b*“) ? true


Solution

Approach #1 Recursive function

General idea : try to match the string with pattern from left to right, the point is that you can recursively call yourself when you complete matching current si (point to current location of string) with current pi (point to current location of pattern), and return whether the left part of string and pattern matches . Next you must handle the ‘.’ situation and ‘*‘ situation carefully .

  • ‘.’ situation : easy to handle, matches any letter except end of a string .
  • ‘*‘ situation : two basic cases . (letter)* matches empty and mutiple (letter). Note that you should consider every situation for the second case .
  • terminal condition : si reaches end or pi reaches end .
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/*
* c++ implementation
* time : 28 ms
*/
bool match(char a, char b) {
return b == '.' ? a!='\0' : a == b;
}
bool isMatch(string s, string p, int si, int pi) {
// terminal condition
if (si == s.length() && pi == p.length())return true;
if (pi == p.length())return si == s.length();
// cases
if (p[pi + 1] != '*') {
return match(s[si], p[pi]) && isMatch(s, p, si+1, pi+1);
}
else {
if (isMatch(s, p, si, pi + 2)) return true;
while (match(s[si++], p[pi]))
if (isMatch(s, p, si, pi + 2)) return true;
return false;
}
}

Approach #2 Dynamic Planning

Recursive function takes much time and space, we can use DP method to optimze the solution. Let $p(i, j)$ describle whether string $S[0, i)$ matches with pattern $P[0,j)$ . Then the question becomes getting $p(S.length(), P.length())$ . According the problem, we can analyze the updating formula of $p(i, j)$ :

  • if $S[i] == P[j] :\ p(i, j) = p(i-1, j-1)$ .
  • if $P[j] == ‘.’ :\ p(i, j) = p(i-1, j-1)$ .
  • if $P[j] == ‘*’ :$
    here are two sub conditions :
  1. if $P[j-1] != S[i] :\ p(i, j) = p(i, j-2)$ . a* counts as empty
  2. if $P[j-1] == S[i]\ or\ P[j-1]==’.’ :\ p(i, j) = p(i-1, j)\ or\ p(i, j-1)\ or\ p(i, j-2)$

As we have updating formula, the left problem is initializing $p(i, j)$ . Obviouly, $p(0,0)$ must be $true$ , $p(i>0, 0)$ must be $false$, and we have to initialize $p(0, j>0)$ accoding to pattern $P$ .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
/*
* time : 12 ms (O(N*M))
*/
bool isMatch(string a, string b) {
bool r = false;
bool **P = (bool **)malloc((a.length() + 1) * sizeof(bool *));
for (int i = 0; i < a.length() + 1; i++) {
P[i] = (bool *)malloc((b.length() + 1) * sizeof(bool));
memset(P[i], 0, (b.length() + 1) * sizeof(bool));
}
P[0][0] = true;
for (int i = 1; i < b.length(); i++) {
if (b[i] == '*' && P[0][i - 1])
P[0][i + 1] = true;
}
for (int i = 0; i < a.length(); i++) {
for (int j = 0; j < b.length(); j++) {
if (a[i] == b[j] || b[j] == '.')
P[i + 1][j + 1] = P[i][j];
if (b[j] == '*') {
if (a[i] == b[j - 1] || b[j - 1] == '.')
P[i + 1][j + 1] = (P[i + 1][j] || P[i + 1][j - 1] || P[i][j + 1]);
else
P[i + 1][j + 1] = P[i + 1][j - 1];
}
}
}
r = P[a.length()][b.length()];
for (int i = 0; i < a.length(); i++) {
free(P[i]);
}
free(P);
return r;
}