[LeetCode] 940. Distinct Subsequences II

[LeetCode] 940. Distinct Subsequences II
Given a string S, count the number of distinct, non-empty subsequences of S .

Since the result may be large, return the answer modulo 10^9 + 7.

Example 1:
```
Input: "abc"
Output: 7
Explanation: The 7 distinct subsequences are "a", "b", "c", "ab", "ac", "bc", and "abc".
```
Example 2:
```
Input: "aba"
Output: 6
Explanation: The 6 distinct subsequences are "a", "b", "ab", "ba", "aa" and "aba".
```
Example 3:

Input: "aaa" Output: 3 Explanation: The 3 distinct subsequences are "a", "aa" and "aaa".
Note:
1. S contains only lowercase letters.
2. 1 <= S.length <= 2000
Key Observation: If we ignore the distinct requirement for now, then each new character we takes in doubles the total count of subsequences we have. (Add this new character to the end of all searched subsequences).

Based on the above observation, we derive ssCnt of s[0, i] = 2 * ssCnt of s[0, i - 1], without considering the distinct requirement. If the newly added character has never appear before then we are gold. Otherwise, we need to remove the duplicated subsequences. Let's call the previous appearing index j, then we know that all subsequences that end at s[j] are duplicated by the current index i character. We need to subtract this count.

Dynamic Programming Algorithm O(N) runtime and space

idx[j]: the jth letter's previous appearing index, -1 means it has not appeared previously.

dp[i]: the total number of distinct subsequences from s[0, i - 1];

dp[0] = 1, representing the empty subsequence;

dp[i] = dp[i - 1] * 2 + (idx[s[i]] >= 0 ? -dp[idx[s[i]]]: 0);

Answer: dp[N] - 1, -1 to exclude the empty subsequence.

Key note: dp is of length n + 1, so its index represents the current considering substring's length, which is 1 ahead of the 0-indexed string scan. So dp[idx[j] + 1] is the total number of distinct subsequences for s[0, j] whereas dp[idx[j]] is the number of distinct subsequences that END at s[j]. When removing duplicates, we only want to remove the ones that are contributed by s[j], i.e, the ones that END at s[j].
```
class Solution {
    public int distinctSubseqII(String S) {
        int mod = (int)1e9 + 7, n = S.length();
        int[] idx = new int[26];
        long[] dp = new long[n + 1];
        dp[0] = 1;
        Arrays.fill(idx, -1);
        for(int i = 1; i <= n; i++) {
            dp[i] = dp[i - 1] * 2 % mod;
            if(idx[S.charAt(i - 1) - 'a'] >= 0) {
                dp[i] = (dp[i] + mod - dp[idx[S.charAt(i - 1) - 'a']]) % mod;
            }
            idx[S.charAt(i - 1) - 'a'] = i - 1;
        }
        return (int)((dp[n] + mod - 1) % mod);
    }
}
```
O(1) space optimization
```
class Solution {
    public int distinctSubseqII(String S) {
        int mod = (int)1e9 + 7, n = S.length();
        //prevOccurCnt[i]: the total number of distinct subsequences that end at s[i]
        long[] prevOccurCnt = new long[26];
        long prevSum = 1, currSum = 0;
        for(int i = 0; i < n; i++) {
            currSum = prevSum * 2 % mod;            
            if(prevOccurCnt[S.charAt(i) - 'a'] > 0) {
                currSum = (currSum + mod - prevOccurCnt[S.charAt(i) - 'a']) % mod;
            }
            prevOccurCnt[S.charAt(i) - 'a'] = prevSum;
            prevSum = currSum;
        }
        return (int)((currSum + mod - 1) % mod);
    }
}
```
Related Problems

[LeetCode 115] Distinct Subsequences
相关阅读:
【OpenCV学习】多通道矩阵的赋值和取值
 使用BackGroundWorker在多线程中访问Winform控件，当不是创建这个控件的线程访问控件时，把线程调整到是创建这个控件的线程去控制。，代码为红色的
 创建Windows服务程序实现定时操作
 新建和发布Windows服务的几个常见问题
 关于程序卡时解决方案
 c#中使用多线程访问winform中控件的若干问题解决线程间操作无效: 从不是创建控件的线程访问它
 winform安装项目、安装包的制作、部署
 C# winform 使用进度条(两种形式)
WinForm如何控制ShowDialog()的返回值，并且可以判断是否会弹出主窗体
 收藏几个好用的webservice
原文地址：https://www.cnblogs.com/lz87/p/12515113.html

[LeetCode] 940. Distinct Subsequences II

[LeetCode 115] Distinct Subsequences