Back to solutions
Repeated DNA Sequences
MediumArraysFind all length-10 substrings that occur more than once.
Constraints
- 1 <= length <= 10^5
- letters in {A, C, G, T}
Sliding window with a hash map
Time O(n) windowsSpace O(n)
Slide a 10-character window across the string. Count each window's occurrences in a hash map; the first time a window's count reaches two, add it to the result set. Using a set avoids listing the same repeat more than once.
Key terms
- sliding window:
- A fixed-length substring that moves one character at a time across the input.
- occurrence count:
- How many times a particular window has appeared so far.
vector<string> findRepeatedDnaSequences(const string& s) {
unordered_map<string,int> seen; set<string> res;
for (int i = 0; i + 10 <= (int)s.size(); i++) {
string sub = s.substr(i, 10);
if (++seen[sub] == 2) res.insert(sub);
}
return vector<string>(res.begin(), res.end());
}Step by step
- For each start index, take the 10-character substring.
- Increment its count in a map.
- When a substring's count first hits 2, record it in the answer set.
- Return the collected substrings (sorted for determinism).