Back to solutions

Repeated DNA Sequences

MediumArrays

Find all length-10 substrings that occur more than once.

Constraints
  • 1 <= length <= 10^5
  • letters in {A, C, G, T}

Sliding window with a hash map

Time O(n) windowsSpace O(n)

Slide a 10-character window across the string. Count each window's occurrences in a hash map; the first time a window's count reaches two, add it to the result set. Using a set avoids listing the same repeat more than once.

Key terms
sliding window:
A fixed-length substring that moves one character at a time across the input.
occurrence count:
How many times a particular window has appeared so far.
vector<string> findRepeatedDnaSequences(const string& s) {
    unordered_map<string,int> seen; set<string> res;
    for (int i = 0; i + 10 <= (int)s.size(); i++) {
        string sub = s.substr(i, 10);
        if (++seen[sub] == 2) res.insert(sub);
    }
    return vector<string>(res.begin(), res.end());
}
Step by step
  1. For each start index, take the 10-character substring.
  2. Increment its count in a map.
  3. When a substring's count first hits 2, record it in the answer set.
  4. Return the collected substrings (sorted for determinism).