Hyper-Dimensional Computing for Symbolic AI¶
HDC (Hyper-Dimensional Computing), also called Vector Symbolic Architecture (VSA), represents symbols as random vectors in a d-dimensional space (d = 10 000 typical). At this dimensionality, independently sampled binary vectors are nearly orthogonal with high probability: expected Hamming distance = d/2, standard deviation = d^(1/2)/2. Three algebraic operations -- binding, bundling, permutation -- suffice for structured symbolic reasoning.
References: Kanerva, "Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors", Cognitive Computation 1(2), 2009. Neubert et al., "An Introduction to Hyperdimensional Computing for Robotics", KI - Kunstliche Intelligenz 33(4), 2019.
Prerequisites: Python 3.10+, pip install sc-neurocore (pure Python path;
the optional Rust engine adds SIMD acceleration but is not required).
1. The algebra¶
| Operation | Binary impl. | Symbol | Property |
|---|---|---|---|
| Bind | XOR | a ^ b |
Self-inverse: (a ^ b) ^ b = a |
| Bundle | Majority vote | [a, b, c] -> threshold(sum) |
Preserves similarity to constituents |
| Permute | Cyclic shift | roll(a, k) |
Produces quasi-orthogonal vector for each k |
Binding associates two concepts (e.g. role-filler). Bundling superimposes multiple items into a set. Permutation encodes positional/sequential structure.
2. Core API¶
from sc_neurocore.hdc import HDCEncoder, AssociativeMemory
enc = HDCEncoder(dim=10_000)
HDCEncoder operates on binary {0, 1} NumPy vectors (dtype uint8).
a = enc.generate_random_vector() # shape (10000,), dtype uint8
b = enc.generate_random_vector()
bound = enc.bind(a, b) # XOR
bundled = enc.bundle([a, b]) # majority vote
shifted = enc.permute(a, shifts=1) # cyclic left shift
AssociativeMemory stores labelled prototype vectors and retrieves the
nearest match by Hamming distance:
mem = AssociativeMemory()
mem.store("cat", cat_vec)
mem.store("dog", dog_vec)
label = mem.query(noisy_cat_vec) # -> "cat"
3. Symbolic key-value memory¶
Encode country-capital pairs using the Multiply-Add-Permute (MAP) scheme:
record = bind(role_country, country) XOR-bundled-with bind(role_capital, capital)
import numpy as np
from sc_neurocore.hdc import HDCEncoder, AssociativeMemory
D = 10_000
enc = HDCEncoder(dim=D)
rng = np.random.default_rng(42)
def make_vec():
return rng.integers(0, 2, D, dtype=np.uint8)
# Atomic symbols
role_country, role_capital = make_vec(), make_vec()
atoms = {
"France": make_vec(), "Germany": make_vec(), "Japan": make_vec(),
"Paris": make_vec(), "Berlin": make_vec(), "Tokyo": make_vec(),
}
# Encode records: record_i = bind(role_country, country_i) + bind(role_capital, capital_i)
pairs = [("France", "Paris"), ("Germany", "Berlin"), ("Japan", "Tokyo")]
records = []
for country, capital in pairs:
bound_country = enc.bind(role_country, atoms[country])
bound_capital = enc.bind(role_capital, atoms[capital])
records.append(enc.bundle([bound_country, bound_capital]))
# Superimpose all records into a single memory vector
memory = enc.bundle(records)
4. Querying: "Capital of France?"¶
Unbinding reverses a bind. Since XOR is self-inverse, bind the query role and the known key, then XOR with memory:
# Unbind France, then unbind role_capital
probe = enc.bind(memory, atoms["France"])
answer = enc.bind(probe, role_capital)
# Find closest atom by Hamming distance
codebook = AssociativeMemory()
for name, vec in atoms.items():
codebook.store(name, vec)
result = codebook.query(answer)
print(f"Capital of France -> {result}") # Paris
The answer vector is noisy (bundling and multiple XOR operations degrade the signal), but at d = 10 000 the correct atom remains the nearest neighbour.
5. Similarity metric¶
Hamming similarity = 1 - (Hamming distance / d). Random vectors score ~0.50. Bound or permuted vectors also score ~0.50 (quasi-orthogonal). A bundled vector scores > 0.50 against each constituent, with the excess depending on the number of items bundled.
def hamming_similarity(a, b):
return 1.0 - np.count_nonzero(a ^ b) / len(a)
a, b = make_vec(), make_vec()
print(f"Random pair: {hamming_similarity(a, b):.3f}") # ~0.500
print(f"Self: {hamming_similarity(a, a):.3f}") # 1.000
print(f"Bound (a^b): {hamming_similarity(a, enc.bind(a, b)):.3f}") # ~0.500
print(f"Permuted: {hamming_similarity(a, enc.permute(a)):.3f}") # ~0.500
bundled = enc.bundle([a, b])
print(f"Bundle vs a: {hamming_similarity(a, bundled):.3f}") # ~0.65-0.75
6. Capacity limits¶
Bundle capacity scales as O(d / log d) (Kanerva 2009, Section 4). For d = 10 000, retrieval degrades around 50-100 bundled items. Increasing d extends capacity proportionally.
codebook = AssociativeMemory()
vectors = [make_vec() for _ in range(200)]
for i, v in enumerate(vectors):
codebook.store(str(i), v)
for n in [5, 10, 25, 50, 100]:
bundled = enc.bundle(vectors[:n])
nearest = codebook.query(bundled)
is_constituent = int(nearest) < n
print(f"N={n:3d}: nearest is constituent = {is_constituent}")
7. Sequence encoding with permutation¶
Permutation encodes order. For a sequence [A, B, C]:
seq = enc.bundle([
enc.permute(atoms["France"], shifts=0), # position 0
enc.permute(atoms["Germany"], shifts=1), # position 1
enc.permute(atoms["Japan"], shifts=2), # position 2
])
# Retrieve position 1: inverse-permute, then query codebook
probe_pos1 = enc.permute(seq, shifts=-1)
print(codebook.query(probe_pos1)) # Germany (highest similarity)
Shift by -k inverts shift by k; the result is closest to the item at position k.
8. Rust-accelerated path¶
When the compiled engine is available (maturin develop --release in
engine/), sc_neurocore_engine.HDCVector wraps the Rust BitStreamTensor
with SIMD popcount, XOR, bundle, and rotation. The Python API mirrors the
pure-NumPy version:
from sc_neurocore_engine import HDCVector
a = HDCVector(10_000, seed=42)
b = HDCVector(10_000, seed=99)
bound = a * b # XOR bind (operator overload)
bundled = HDCVector.bundle([a, b]) # majority vote
shifted = a.permute(1) # cyclic rotation
sim = a.similarity(b) # 1.0 - normalised Hamming distance
The Rust path is 10-50x faster for d >= 10 000, relevant for large codebooks or real-time inference on neuromorphic hardware pipelines.