LeetCode 30. Substring with Concatenation of All Words（暴力/双指针）

历史记录

清除记录

猜你想搜

AcWing热点
App
登录/注册

LeetCode 30. Substring with Concatenation of All Words（暴力/双指针）原题链接困难

作者：

T-SHLoRk , 2019-08-10 13:48:20 , 所有人可见 , 阅读 2502

15

4

题目描述

You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.

Example 1:

Input:
  s = "barfoothefoobarman",
  words = ["foo","bar"]
Output: [0,9]
Explanation: Substrings starting at index 0 and 9 are "barfoor" and "foobar" respectively.
The output order does not matter, returning [9,0] is fine too.

Example 2:

Input:
  s = "wordgoodgoodgoodbestword",
  words = ["word","good","best","word"]
Output: []

题意：给一个字符串s和一个字符串集合，集合中的字符串有相同的长度。找到所有s中的可行的子串使其成为字符串集合中所有单词的一种拼接。返回其所有的可行下标。

算法1

(暴力枚举)

暴力做法。枚举s中所有可能的起点，判断其是否满足要求，即使是暴力也有一些小技巧。首先用hash表存储词典中每个单词出现的个数，每个单词长度我们记为len。对于每个位置，我们同样使用一个hash表存储从这个位置开始的所有字符串及其出现次数，然后依次把它后面长度为len的字符串拿出来。如果这个字符串在hash表中没有出现过，那么返回false；否则判断当前字符串出现个数是否大于目标个数，如果大于，返回false，如果等于说明找到了一个新的可满足的字符串，更新satisfy，如果所有需要的字符串个数都满足了，就记录答案，返回。

时间复杂度分析：枚举所有起点 $O(n)$ ，每次匹配至多匹配 $m$ 个单词，那么时间复杂度为 $O(n\*m\*len)$

C++ 代码

    vector<int> findSubstring(string s, vector<string>& words) {
        unordered_map<string,int> hash;
        vector<int> res;
        int n = s.length(),m = words.size();
        if(n == 0 || m == 0) return res;
        int len = words[0].length(),end = n - m * len;
        if(n < m * len) return res;
        for(auto word:words)
            hash[word] ++;
        for(int i = 0 ;i <= end;i ++)
        {
            unordered_map<string,int> cur_hash;
            int satisfy = 0;
            for(int j = i;j <= n - len; j += len)
            {
                string cur = s.substr(j,len);
                if(hash.find(cur) == hash.end())
                    break;
                else 
                {
                    cur_hash[cur] ++;
                    if(cur_hash[cur] > hash[cur])
                        break;
                    else if(cur_hash[cur] == hash[cur])
                        satisfy ++;
                    if(satisfy == hash.size())
                    {
                        res.push_back(i);
                        break;
                    }
                }
            }
        }
        return res;
    }

算法2

(双指针) $O(n^2)$

题解2:双指针做法。因为每个单词中的长度len相同，那么我们可以根据每次枚举的起始位置将其划分为len组不同的匹配。

len = 3,s = abcdefghijkl
划分1: abc def ghi jkl
划分2: bcd efg hij
划分3: cde fgh ijk

那么问题其实就转换成了Leetcode76题，在S找一个最短的子串，包含T中所有的字母。只不过这一题中不能包含任何其他的字符，这就意味着我们当前读到的单词无法匹配就直接break。

首先枚举所有的划分起点位置，然后i,j分别代表当前窗口的左右指针，cur_hash记录当前窗口内每个单词出现的次数，satisfy记录已经满足的单词个数，每次读入一个单词：

如果当前单词没有出现过，说明失配了，那么将j指针后移，i指针转成j指针位置，重新匹配。
如果出现过，更新cur_hash：
如果当前单词出现个数等于我们需要的单词个数，那么说明找到了一个新的可满足的单词，更新satisfy。
如果当前单词单词出现个数大于我们需要的单词个数，那么需要将i指针后移，直至当前单词个数恰好等于需要的单词个数。i指针后移的过程中，需要更新cur_hash，同时如果删除的单词恰好从满足变成不满足也需要更新satisfy。
此时需要判断是否找到了一个可满足的序列，如果找到了记录答案，同时将i指针后移，同更新cur_hash和satisfy。

时间复杂度分析：枚举所有可能的划分 $O(len)$ ，对于每一种划分，双指针最多匹配 $O(n + m *len)$ ，所以总的指尖复杂度为 $O((n + m * len) * len)$

C++ 代码

    vector<int> findSubstring(string s, vector<string>& words) {
        unordered_map<string,int> hash;
        vector<int> res;
        int n = s.length(),m = words.size();
        if(n == 0 || m == 0) return res;
        int len = words[0].length(),end = n - m * len;
        if(n < m * len) return res;
        for(auto word:words)
            hash[word] ++;
        int size = hash.size();
        for(int k = 0;k < len ; k ++)
        {
            unordered_map<string,int> cur_hash;
            int satisfy = 0;
            for(int i = k,j = k;j <= n - len;)
            {
                string cur = s.substr(j,len);
                if(hash.find(cur) == hash.end())
                {
                    j = j + len;
                    i = j;
                    cur_hash.clear();
                    satisfy = 0;
                }else 
                {
                    cur_hash[cur] ++;
                    if(cur_hash[cur] == hash[cur])
                        satisfy ++;
                    else if(cur_hash[cur] > hash[cur])
                    {
                        while(i < j && cur_hash[cur] > hash[cur])
                        {
                            string temp = s.substr(i,len);
                            i += len;
                            cur_hash[temp] --;
                            if(cur_hash[temp] == hash[temp] - 1)
                                satisfy --;
                        }
                    }
                    if(satisfy == size)
                    {
                        string temp = s.substr(i,len);
                        cur_hash[temp] --;
                        satisfy --;
                        res.push_back(i);
                        i = i + len;
                    }
                    j = j + len;
                }
            }
        }
        return res;
    }

5 评论

大锤 2025-02-21 11:43 · 福建

写得太好了，感谢，本来看y总的很懵逼，然后一翻，懂了！

大锤 2025-02-21 12:02 · 福建

我这种朴素写法搞懂了，然后在此基础上，再去学一下y总咋写那么短的，OK，感谢

大锤 2025-02-21 12:04 · 福建回复了大锤的评论

 vector<int> findSubstring(string s, vector<string>& words) {
        unordered_map<string,int> hash;     // 存放单词表的所有单词的哈希
        vector<int> res;    // 结果
        int n = s.length(),m = words.size();  // n是待检测字符串长度，m是单词数量
        if(n == 0 || m == 0) return res; // 其中一个是0，那就不存在答案，返回即可
        int len = words[0].length();    // len是单词的长度（枚举每个起点用）
        if(n < m * len) return res;  // 如果words所有长度比待检测字符串都长，那就不存在答案
        for(auto word:words)  // 设置单词表的所有单词的哈希
            hash[word] ++;
        int size = hash.size();  // 哈希表的大小，最后用于对比当前窗口是不是满足条件啦
        for(int k = 0;k < len ; k ++)  // 起点划分为（单词长度）组，也就是0 1 2 3..len-1
        {
            unordered_map<string,int> cur_hash;  // 当前窗口里面所有单词的哈希
            int satisfy = 0;  // 判断cur_hash目前有多少个单词满足要求了

            // 这个循环就是滑动窗口，注意把一整个单词看做一个整体，就是一个方块啦
            // 一开始窗口左边界是k，右边界也是k，然后不断右移右边界，维护左边界即可
            for(int i = k,j = k;j <= n - len;)  // 起点是k，i是窗口左值、j是右。
// 每个位置是一个单词，所以j对应的位置，要有一整个单词，我们看下面的substr
            {

                string cur = s.substr(j,len);  // 这里要保证j之后有len长，下标是：
                // j+0 j+1 j+2 j+3...j+len - 1, 所以for循环保证j+len-1 <= n - 1即可

                // 如果单词哈希表压根没找到当前想加入滑动窗口的单词，也就是这个窗口作废，右移
                if(hash.find(cur) == hash.end())
                {
                    j = j + len;  // 滑动窗口的右边界右移一个单位（一个单位是一整个单词）
                    i = j;  // 滑动窗口左边界也移动到这，因为前面都废了，出现不该出现的单词
                    cur_hash.clear();  // 清空当前哈希表
                    satisfy = 0;  // 当前窗口中满足条件的个数是0
                }else  // 找到这个单词
                {
                    cur_hash[cur] ++;  // 窗口中这个单词出现次数加1
                    if(cur_hash[cur] == hash[cur])  // 窗口中出现次数 == 单词表中出现次数
                        satisfy ++;  // 满足条件次数+1
                    else if(cur_hash[cur] > hash[cur])  // 多出现了，右移窗口左边界，直到<=
                    {

                        while(i < j && cur_hash[cur] > hash[cur])
                        {
                            string temp = s.substr(i,len);
                            i += len;
                            cur_hash[temp] --; // 删除窗口中的这个单词
                            if(cur_hash[temp] == hash[temp] - 1)
                                satisfy --;// 注意如果少了，则满足条件-1
                        }
                    }
                    // 这个窗口找到这个单词，并且完全符合单词表，则左指针右移一下，窗口左边界去除掉
                    if(satisfy == size)
                    {
                        string temp = s.substr(i,len);  
                        cur_hash[temp] --;
                        satisfy --;
                        res.push_back(i);  // 得到一个答案
                        i = i + len;  // 左边界右移
                    }
                    j = j + len;  // 窗口滑动，右边界右移
                }
            }
        }
        return res;
    }

作者：T-SHLoRk
链接：https://www.acwing.com/solution/content/3669/
来源：AcWing
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。