Question Bank
1. Time-versioned key-value store
Section titled “1. Time-versioned key-value store”The most-reported question, in some form. Implement a time-based KV store using real timestamps; follow-ups cover how to write tests, mock timestamps, ensure strictly increasing timestamps, handle locks under multithreading, and compare lock implementations.
set(key, value) # uses current timestampget(key) # latest valueget(key, ts) # value at-or-before tsget(key, version=N) # Nth set- P1: Single-threaded. Pick the per-key data structure that makes
get(key, ts)log-time. Justify. - P2: Make timestamps injectable (clock abstraction) so tests don’t need
time.sleep. Show how you’d test out-of-order writes, ties, and reads at gaps. - P3: Multi-threaded writes + reads. Compare global lock, per-key lock, optimistic (version check + retry). Pick one and defend.
- P4: 1B keys, 99% idle. How does memory change?
2. Spreadsheet with formulas
Section titled “2. Spreadsheet with formulas”set_cell(addr, expr) # number or "=A1+B2*3"get_cell(addr) -> float- P1: Numeric values +
=expressions over+ - * /and cell refs. Evaluate lazily onget_cell. - P2: Make get_cell O(1); set_cell must proactively update all dependent downstream cells when a value changes. Switch to eager propagation via dep graph.
- P3: Cycle detection (DFS + visiting set). Reject the offending
set_celland preserve prior state — no half-updates. - P4: Concurrent
set_cellon cells in the same connected component. Minimal locking that’s still correct.
3. Resumable iterator over a large dataset
Section titled “3. Resumable iterator over a large dataset”Implement a resumable iterator for a large dataset.
class ResumableIterator: def __init__(self, source): ... def next_batch(self, n) -> list def checkpoint(self) -> bytes @classmethod def restore(cls, source, blob): ...- P1: Wrap a generator/source; produce items in chunks of
n; track position. - P2:
checkpoint()returns enough state to resume after process restart.restore()reconstructs without re-reading consumed items. - P3: Source size = 10B items.
checkpoint()must be O(1) in size, not O(consumed). - P4: Two threads call
next_batchon the same iterator. Define semantics (serialized? interleaved?) and enforce.
4. In-memory SQL subset
Section titled “4. In-memory SQL subset”Implement an in-memory database supporting insert and query, with where filtering and order by sorting; provide select(table_name, where=None, order_by=None); multiple where conditions only support AND logic; original requirements simulate SQL using a map, support comparison operators, and order by.
create_table(name, columns)insert(name, row: dict)select(name, where=None, order_by=None) -> list[dict]where is a list of (col, op, val) joined by AND, with ops in {=, !=, <, <=, >, >=}. order_by is a list of (col, "asc"|"desc").
- P1: Build it. Tests: empty where, multi-column where, single-col sort, multi-col sort.
- P2: 10M rows, point queries on indexed column. Add a hash index. Trade-offs vs sorted index for range queries.
- P3: API stability: extend
whereto support OR without breaking existing callers. - P4: Concurrent inserts during a select — what consistency model do you offer? Snapshot? Read-committed? How do you implement it cheaply?
5. KV store with custom serialization to disk
Section titled “5. KV store with custom serialization to disk”Implement serialization and deserialization for a key-value store where both keys and values can contain any characters including delimiters; you can’t use simple delimiters because they might appear in the data; most candidates land on length-prefix encoding (3:key5:value), the same pattern used in Redis protocol. Custom serialization/deserialization must be implemented (no Python built-in libraries like json).
class KVStore: def set(self, key: str, value: str): ... def get(self, key: str) -> str | None: ... def save(self, path: str): ... @classmethod def load(cls, path: str) -> "KVStore": ...- P1: Length-prefix encoding. Show on paper why a delimiter approach (e.g.
key=value\n) breaks for arbitrary-content keys/values. - P2: Round-trip property test: random keys/values containing
\n,:,\0, unicode — must survive. - P3: Append-only log instead of full rewrite on every
save(). Compaction strategy. - P4: Crash mid-write. How do you detect a torn record on
load()? (Length prefix + checksum, or length prefix + final-marker.)
6. Path normalization with symlinks
Section titled “6. Path normalization with symlinks”Implement a function to normalize filesystem paths, resolving . and .. components and handling symbolic links; handle . by ignoring it, handle .. by popping from the stack, resolve symbolic links using a provided mapping dictionary. (Also reported as: implement cd command navigation with cycle detection.)
normalize(path: str, symlinks: dict[str, str]) -> str# normalize("/home/user/docs/../photos/./img.png", {}) -> "/home/user/photos/img.png"# normalize("/photos/../docs", {"/photos": "/media/external/photos"}) -> ?- P1:
.,.., repeated slashes, no symlinks. Stack-based. - P2: Add symlink resolution. Define when symlinks resolve (eager on each component? lazy at end?). Pick one and justify against POSIX behavior.
- P3: Detect symlink cycles (
/a -> /b,/b -> /a) and bound resolution. - P4: Relative symlinks:
/foo/link -> ../bar. Handle correctly.
7. Expiring credit ledger
Section titled “7. Expiring credit ledger”A complex data-structure problem involving time-based credit allocation and expiration; implement a system that supports adding credits, expiring credits, and consuming credits; consumption rule: always consume the credits that expire the soonest (FIFO/queue); core logic: when performing Cost(t, x) (consume x credits at time t), if the balance is sufficient, the system must correctly update the totals for future expiration events so that consumption is fully reflected.
add(t_grant, t_expire, amount) # grant credits available [t_grant, t_expire)cost(t, x) -> bool # consume x at time t, FIFO by earliest expiry; False if insufficientbalance(t) -> int # available credits at time t- P1: Build with a min-heap keyed by expiry.
costwalks the heap consuming earliest-expiring first, dropping already-expired entries lazily. - P2:
balance(t)for arbitraryt— including times in the future and past. What structure makes this cheap? - P3: 1M
costops/sec. Profile: where’s the cost? (Heap rebalance vs bookkeeping.) - P4: A query asks “total credits expiring in [t1, t2].” Add this without scanning the whole heap.
8. Concurrent web crawler
Section titled “8. Concurrent web crawler”Build a web crawler that uses multiple threads to fetch pages concurrently. You need to coordinate between threads, avoid crawling the same URL twice, handle failures, and probably implement some kind of rate limiting.
class Crawler: def __init__(self, fetch_fn, max_workers, rate_per_sec): ... def crawl(self, seed_url, max_pages) -> list[str]fetch_fn(url) -> (html, links). Mock it in tests with a fake graph + injected failures.
- P1: Single-threaded BFS with visited set. Get correctness first.
- P2: Thread pool of N workers, shared queue + visited set. Fix the obvious races.
- P3: Rate limit at
rate_per_secglobally (not per-thread). Token bucket or semaphore? What happens when workers idle waiting for tokens? - P4:
fetch_fnfails 5% of the time. Retry with exponential backoff, but cap total attempts. Make sure a permanent failure doesn’t deadlock other workers.
9. Counter with a race condition (debug + fix)
Section titled “9. Counter with a race condition (debug + fix)”Identify the race condition in this code and modify it to prevent the issue; explain race conditions conceptually, identify the specific vulnerability (read-modify-write is not atomic), fix using appropriate primitives (locks, atomics, thread-safe queues).
class Counter: def __init__(self): self.count = 0 def increment(self): current = self.count self.count = current + 1 # race: read-modify-write not atomic- P1: Identify the race. Write a test that deterministically exposes it (not just “run with N threads and pray”). Hint: monkey-patch a sleep between read and write.
- P2: Fix with
Lock. Measure throughput at 1, 4, 16 threads. - P3: Replace
Lockwith atomic CAS (itertools.countor per-shard counters). Compare contention. - P4: Now
incrementalso writes to a list of recent updates. Avoid the lock holding the list-append — separate hot path from cold path.
10. Mini ORM
Section titled “10. Mini ORM”Build a simple ORM (Object-Relational Mapping) layer.
class Model: # subclasses define class attributes typed as Field(...) # framework provides: save(), delete(), .objects.filter(...).order_by(...).all()Backing store: in-memory dict-of-dicts (don’t write to a real DB).
- P1: Field descriptors (str, int, datetime). Type validation on assignment.
save()inserts or updates by primary key. - P2:
Managerclass exposingfilter(**kwargs)returning a lazy QuerySet. Chainable:.filter(...).filter(...).order_by('field').all(). Evaluation deferred until iteration. - P3:
__repr__/__eq__/ hashing on PK. What breaks if a user mutates a field after__hash__is computed? - P4: Migrations: a model adds a field. Existing in-memory rows lack it. Define and implement the upgrade path.