Given a string s and a string t, check if s is subsequence of t.
You may assume that there is only lower case English letters in both s and t. t is potentially a very long (length ~= 500,000) string, and s is a short string (<=100).
A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ace"
is a subsequence of "abcde"
while "aec"
is not).
Example 1:
s = "abc"
, t = "ahbgdc"
Return true
.
Example 2:
s = "axc"
, t = "ahbgdc"
Return false
.
Follow up:
If there are lots of incoming S, say S1, S2, ... , Sk where k >= 1B, and you want to check one by one to see if T has its subsequence. In this scenario, how would you change your code?
Solution:
1. 1st naive method, use two pointers;
iterate every s' character cs using ponter i ,and every t's character ct using pointer j, judge cs == ct and record the matching prevPos = j
if cs == ct, increase i by 1 or else increase j + 1 until every cs is checked
m = len(s), n= len(t), worst time complexity o(m*(n-m))
1 i = 0 2 prevPos = -1 3 while (i < len(s)): 4 flag = False 5 j = prevPos + 1 6 while (j < len(t)): 7 if s[i] == t[j]: 8 flag = True 9 prevPos = j 10 break 11 j += 1 12 if not flag: 13 return False 14 else: 15 i += 1 16 return True
2. 2 nd use binary search (but it has TIME LIMIT EXCEEDED)
(1)set up a new list indtLst contain the tuple of element and its index ,
(2)then sort the indtLst
(3) iterate every cs of s in t and binary search cs in t, record the search current position pos and previous position prevPos
1 indtLst = sorted([(ct, i) for i, ct in enumerate(t)]) 2 #print (" t: ", t) 3 i = 0 4 prevPos = -1 5 flag = True 6 while (i < len(s)): 7 pos = bisect_left(indtLst, (s[i],)) 8 #print (" s111: ", i, s[i], pos, prevPos ,t[pos][1] ) 9 flag = False 10 if pos ==len(indtLst) or indtLst[pos][0] != s[i]: 11 return False 12 elif indtLst[pos][0] == s[i] and indtLst[pos][1] > prevPos: 13 flag = True 14 prevPos = indtLst[pos][1] 15 #t.pop(pos) 16 i += 1 17 #print (" t222: ", i, s[i], pos, prevPos) 18 elif indtLst[pos][0] == s[i] and indtLst[pos][1] <= prevPos: #if there is duplicated element cs in s 19 #if there are repeated character in "s" 20 k = pos+1 21 while k < len(indtLst) and indtLst[k][0] == s[i]: 22 if indtLst[k][0] == s[i] and indtLst[k][1] > prevPos: 23 #t.pop(k) 24 #print ('ddddddddd: ',t[k][1], prevPos, k) 25 prevPos = indtLst[k][1] 26 flag = True 27 i += 1 28 break 29 k += 1 30 if not flag: 31 return False 32 #print ('w2233: ', i, s[i], pos, prevPos,k) 33 34 return True
note: the reason for the TLE problem is (1) the sort of lst and the binary search could also take extra time (2) the multiple duplicated cs search in t also take linear time.
so we may set up a dictionary to store every element ct and its index in t, then the search for cs in t would take o(1) time for non-duplicated cs, o(n) time for duplicated cs.