Hashing
Hashing is the technique of mapping a complex object — a string, a set, a game board, a tree — to a small integer so that equality checks become . In competitive programming it appears in at least four distinct flavours: polynomial string hashing for fast substring comparison, set/multiset hashing for order-independent equality, Zobrist hashing for incrementally-updated composite states, and tree hashing for structural isomorphism. Each works by assigning random values to primitive components and combining them with cheap operations (multiplication, XOR, addition); a collision — two distinct objects with the same hash — occurs with low probability and can be made negligible by using two independent hash functions simultaneously.
Polynomial string hashing
Fix a base and a large prime modulus. The polynomial hash of a string is
Store prefix hashes and prefix powers . Any substring (0-indexed, inclusive) can then be hashed in
Using two independent (base, modulus) pairs — double hashing — reduces the false-positive probability from to , which is safe even for comparisons.
struct HashStr {
static const long long M1=1000000007LL, M2=1000000009LL, B1=131, B2=137;
int n;
vector<long long> h1, h2, pw1, pw2;
HashStr(const string& s) : n(s.size()), h1(n+1), h2(n+1), pw1(n+1,1), pw2(n+1,1) {
for (int i = 0; i < n; i++) {
h1[i+1] = (h1[i]*B1 + s[i]) % M1;
h2[i+1] = (h2[i]*B2 + s[i]) % M2;
pw1[i+1] = pw1[i]*B1 % M1;
pw2[i+1] = pw2[i]*B2 % M2;
}
}
// Hash of s[l..r] (0-indexed, inclusive).
pair<long long,long long> get(int l, int r) const {
return {(h1[r+1] - h1[l]*pw1[r-l+1]%M1 + M1*2) % M1,
(h2[r+1] - h2[l]*pw2[r-l+1]%M2 + M2*2) % M2};
}
};
The + M*2 before the final % prevents negative results from the subtraction.
Anti-hack: randomize the base at runtime
On Codeforces an adversary can craft anti-hash tests for any fixed base. The defence is to pick the base randomly at program start:
mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
const long long B = uniform_int_distribution<long long>(200, 1e9)(rng);
Use this B in place of the compile-time constants above.
Applications of string hashing
- Pattern matching (Rabin-Karp algorithm). Compare the hash of each length- window of text against the hash of pattern in time.
- Palindrome detection. Build a
HashStrfor both and its reverse. Substring is a palindrome when its forward hash equals the reversed string's hash over the mirrored interval. Binary-searching the longest palindromic extent at each centre costs - Longest common substring. Binary search on length ; for each insert all length- hashes of one string into a set and probe with the other. total.
- Counting distinct substrings. Insert all substring hashes into an
unordered_set; its size is the answer. For large a Suffix array or Suffix automaton is faster. - Circular shift comparison. Hash of rotation of equals
H(s+s).get(k, k+n-1); sorting all rotations takes
2D string hashing
For grids, extend to two dimensions:
A 2D prefix-hash table lets any rectangular subgrid hash be retrieved in , enabling -time pattern matching in a grid.
Set/multiset hashing via random assignment
Assign each distinct element a random 64-bit value . The XOR hash of a set is
Because XOR is commutative and associative, is order-independent; two sets are equal iff their XOR hashes agree (with overwhelming probability). Insertions and deletions are : just XOR the element's random value in or out.
For multisets, XOR fails because — a double occurrence cancels. Use a sum instead:
Subtraction undoes an element. Keep both and together for stronger guarantees when needed.
mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
// Global random value table — all SetHash objects must share this.
unordered_map<int,uint64_t> rand_val;
uint64_t elem_hash(int x) {
auto [it, ins] = rand_val.emplace(x, 0);
if (ins) it->second = rng();
return it->second;
}
struct SetHash {
uint64_t h = 0; // XOR hash; swap for sum if multiset needed
void insert(int x) { h ^= elem_hash(x); }
void erase (int x) { h ^= elem_hash(x); }
};
Applications of set hashing
- Sliding-window anagram detection. Maintain a
SetHashfor the current window and one for the pattern. Each step is : XOR out the departing element, XOR in the arriving one, compare hashes. Total - Checking if two arrays contain the same elements. Compute the XOR hash of both in ; equal hashes imply equal sets.
- Subset-sum equality. Hash each candidate subset; collect in a hash map. Avoids sorting or canonical forms.
Zobrist hashing
Zobrist hashing (Zobrist 1970) is a specialisation of random-assignment hashing for composite states that change incrementally — most famously chess board positions, but also any state that can be described as a set of active features.
Precompute a table of independent random 64-bit values: one per (position, feature) pair. The hash of a state is the XOR of the values of all currently active features. When the state changes, XOR out the old feature value and XOR in the new one — per move regardless of board size.
// Example: 4x4 board, pieces 0=empty 1=X 2=O.
// In practice initialise once at program start.
uint64_t ZT[ROWS][COLS][NUM_PIECES];
void init_zobrist() {
mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
for (auto& row : ZT) for (auto& col : row) for (auto& v : col) v = rng();
}
struct Board {
int cell[ROWS][COLS] = {};
uint64_t h = 0;
// Place piece p at (r,c), replacing whatever was there.
void place(int r, int c, int p) {
h ^= ZT[r][c][cell[r][c]]; // remove old piece
cell[r][c] = p;
h ^= ZT[r][c][p]; // add new piece
}
};
Moves are trivially undone by calling place again with the previous piece
(since XOR is self-inverse).
Applications of Zobrist hashing
- Transposition tables in game-tree search. Store evaluated board positions in a hash map keyed by Zobrist hash. When the same position is reached by different move sequences, the evaluation is reused — the key speed-up in minimax / alpha-beta.
- Cycle detection in abstract games. Check whether the current board hash has appeared before (e.g. detecting superko in Go).
- Set membership tracking. Any problem that can be expressed as "which items are currently active" benefits from Zobrist: flip an item's random value to add or remove it.
Tree hashing and isomorphism
Two rooted trees are isomorphic if one can be relabelled to match the other. Hashing provides an randomised check.
Assign hash to the empty tree. For a node with children , compute each child's hash recursively, sort the list (children are unordered), then map the sorted tuple to a canonical integer:
// tree_canon maps sorted child-hash vectors to unique integer IDs.
map<vector<int>,int> tree_canon;
int next_id = 1;
int tree_hash(int v, int par, const vector<vector<int>>& adj) {
vector<int> child_hashes;
for (int u : adj[v])
if (u != par) child_hashes.push_back(tree_hash(u, v, adj));
sort(child_hashes.begin(), child_hashes.end());
auto [it, ins] = tree_canon.emplace(child_hashes, 0);
if (ins) it->second = next_id++;
return it->second;
}
Two rooted trees are isomorphic iff their root hashes agree. The canonical ID assignment makes this collision-free (it is essentially AHU canonicalization, not a probabilistic hash).
For unrooted trees, find the centroid of each tree (), root both there, then compare hashes. If a tree has two centroids (always adjacent), try both pairings.
Applications of tree hashing
- Tree isomorphism. Determine whether two given trees have the same structure.
- Counting distinct subtree shapes. Hash every subtree and count unique IDs.
- Canonical tree representations. Assign a globally unique label to each tree shape, enabling set/map operations on trees.
Safe hash maps: the splitmix64 trick
std::unordered_map uses a hash of the key to select a bucket. For integer keys
the default hash is often the identity, which an adversary can exploit by feeding
keys that all land in the same bucket — degrading performance to per
operation. The defence is a quality custom hash function.
splitmix64 (Vigna 2015) is a fast, high-quality bit mixer with excellent avalanche behaviour:
struct SplitMix {
static uint64_t mix(uint64_t x) {
x += 0x9e3779b97f4a7c15ULL;
x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9ULL;
x = (x ^ (x >> 27)) * 0x94d049bb133111ebULL;
return x ^ (x >> 31);
}
// A different random seed each run foils precomputed collision inputs.
static const uint64_t SEED;
size_t operator()(uint64_t x) const { return mix(x ^ SEED); }
};
const uint64_t SplitMix::SEED =
chrono::steady_clock::now().time_since_epoch().count();
// Usage: unordered_map<long long, int, SplitMix> mp;
The XOR with a runtime-random SEED ensures that the hash function varies
between runs, so an adversary who hard-codes a collision set in a test case cannot
rely on it working. Use this whenever unordered_map or unordered_set appears
in a Codeforces submission.
Problems
Polynomial string hashing
- String Matching (CSES)
Solution sketch — String Matching
Build HashStr for both text and pattern. Slide a window of the pattern's length
across the text and compare hash pairs in per position. Total time
. The naive string::find is in the worst case and will TLE.
- Palindrome Queries (CSES)
Solution sketch — Palindrome Queries
Build HashStr for and for . A substring is a
palindrome when HS.get(l,r) == HR.get(n-1-r, n-1-l), checked in .
Point updates require rebuilding the prefix arrays in each; for
frequent updates use a segment tree that stores hash and power per node,
supporting updates and substring hash queries.
- Finding Borders (CSES)
- Longest Common Substring (CSES)
Solution sketch — Longest Common Substring
Binary search on length : for a candidate , hash all length-
substrings of string into an unordered_set, then check each length-
substring of against the set. total.
- String Hashing (Kattis)
- Chasing Subs (Kattis)
- NAJPF — Pattern Find (SPOJ)
Set/multiset hashing
- Counting Distinct Substrings (CSES)
Solution sketch — Counting Distinct Substrings
For each pair , compute the double hash of and insert into
an unordered_set<pair<long long,long long>>. The set size at the end is the
answer. time, fine for ; for larger use a
Suffix array with the LCP array.
- Animal Classification (Kattis) — sliding-window set equality.
Zobrist hashing
- Hashing (Kattis)
- Transposition-table problems in game AI (offline, not an online judge) — Zobrist is the standard tool for deduplicating positions in minimax search.
Tree hashing / isomorphism
- Tree Isomorphism I (CSES) — rooted trees.
- Tree Isomorphism II (CSES) — unrooted trees; find centroid, then apply rooted algorithm.
Solution sketch — Tree Isomorphism I & II
Rooted (problem I): run tree_hash from each given root. Compare the two
integers; equal ↔ isomorphic.
Unrooted (problem II): find the centroid of each
tree. A tree has one or two centroids. Root each tree at its centroid and
compare hashes. If there are two centroids in either tree, try both candidate
roots. from the sort inside tree_hash
- Counting Distinct Trees (CSES) — count non-isomorphic rooted trees on vertices (combine tree hashing with DP).
Safe hash maps
- Any Codeforces problem where
unordered_mapwith integer keys TLEs — wrap it withSplitMixas described above. A classic forced example: Codeforces: Maps and Segments
See also
- Rabin-Karp algorithm — pattern matching via rolling polynomial hashes
- Suffix array — deterministic alternative for string problems
- Suffix automaton — structure; counts distinct substrings exactly
- Knuth-Morris-Pratt algorithm — deterministic pattern matching
- Aho-Corasick automaton — multi-pattern matching
- Z-function — linear-time periodicity and prefix-match queries
- Centroid decomposition — needed to reduce unrooted tree isomorphism to rooted
- Perfect hashing — theoretical construction that guarantees zero collisions

