SYSTEMS PILLAR · RISING DIFFERENTIATOR

Concurrency & threads

The 2025–26 frontier-lab differentiator. Anthropic, xAI, OpenAI all probe explicitly. This page teaches the GIL deep-dive, asyncio internals, lock-free patterns, and at least seeing a memory model — densely enough that one read covers an interview round.

Read ~30 min Asked at Anthropic, OpenAI, xAI, Cohere, Stripe, Nvidia Difficulty Sr / Staff bar

What you'll learn

Threads vs processes vs coroutines — pick the right tool
Memory models — what the CPU and compiler don't promise
Locks — the menu and when each fits
Deadlock — Coffman's four conditions and how to break them
Producer-consumer — the canonical pattern (with code)
Python GIL — what it does, what it doesn't, free-threaded Python
asyncio internals — the event loop and its sharp edges
Multiprocessing & shared memory — true parallelism in Python
Compare-and-swap, ABA, lock-free patterns
RCU and hazard pointers — advanced reclamation
Concurrency LeetCodes — the labs ask these
What companies actually probe

FOUNDATIONS · THE TOOL CHOICE

Threads vs processes vs coroutines — pick the right tool

TL;DR

Three concurrency primitives, three cost profiles. Coroutines win for I/O at scale (10k+ connections). Threads win for mixed CPU+I/O work in non-GIL languages or with C extensions. Processes win for true CPU parallelism in Python. Picking wrong = an order-of-magnitude perf hit.

	Thread	Process	Coroutine (asyncio)
Memory	Shared with peers	Isolated	Shared (single thread)
Switch cost	μs (kernel)	ms (kernel)	~ns (user)
Use case	I/O concurrency in non-GIL langs; mixed CPU/IO	CPU parallelism in Python; isolation	Massive I/O concurrency (10k+ conns)
Hazards	Races, deadlock	IPC overhead	Single blocking call kills throughput

When to reach for threads

Mixed CPU + I/O
C extensions release the GIL (NumPy, PyTorch)
Non-Python languages (Java, Go, C++)

When to reach for asyncio

Lots of network I/O, low CPU
10k+ concurrent connections
Web servers, scrapers, RPC clients

PITFALL — the most common mis-pick

Reaching for threads when async is better, or async when threads are better. Rule of thumb: network I/O at scale → asyncio. Mixed CPU+I/O with C-extension number crunching → threads. Pure CPU parallelism in Python → multiprocessing or free-threaded Python (3.13t).

REMEMBER

Coroutines are ~1000× cheaper to switch than threads.
Threads share memory (races possible); processes don't (need IPC).
One sync blocking call inside an asyncio loop kills the whole party.

FOUNDATIONS · THE SR STAFF BAR

Memory models — what the CPU and compiler don't promise

TL;DR

Modern CPUs and compilers reorder reads and writes for performance. Without explicit synchronization, what one thread sees from another can differ wildly from program order. Sequential consistency is the mental model you wish was true; in practice you write to x86 (TSO) or ARM (weak) and use atomics + memory orders to force the guarantees you need.

The four consistency points to know

Sequential consistency: every operation appears in a global total order; each thread's order matches its program order. The mental model you wish was true. Almost no real hardware provides it for free.
x86 (TSO): total store order. Stores are not reordered with stores; loads are not reordered with loads; stores can be delayed past later loads (store buffer). Strong-ish, but the store-buffer rule trips people up.
ARM / POWER (weak): most reorderings allowed. Need explicit barriers (DMB, ISB).
Happens-before: partial order across threads enforced by synchronization (lock acquire/release, atomic with appropriate ordering, thread join).

C++ memory orders (memorize)

Order	What it guarantees
`memory_order_seq_cst` (default)	Single total order across all SC operations.
`memory_order_acquire` (load)	Subsequent reads/writes don't reorder before this load. Pairs with release.
`memory_order_release` (store)	Prior reads/writes don't reorder after this store. Pairs with acquire.
`memory_order_acq_rel` (RMW)	Both — for compare-exchange, fetch-add, etc.
`memory_order_relaxed`	Atomicity only, no ordering with other ops. Cheapest. Use for stat counters.
`memory_order_consume`	Rarely used. Treated as acquire by most compilers.

EXAMPLE — single-producer / single-consumer ring buffer with relaxed atomics

SPSC ring needs only acquire/release on the head/tail pointers — the data writes themselves can use relaxed because the index publication establishes happens-before. This is the canonical Anthropic concurrency probe.

REMEMBER

Without atomics + memory orders, the CPU and compiler reorder freely.
x86 ≠ ARM. Don't ship code that "works on my Mac" without testing on ARM.
Acquire/release pairs establish happens-before; relaxed is for unordered counters.

PRIMITIVES · LOCKS

Locks — the menu and when each fits

TL;DR

Mutex by default. Spinlock only if the critical section is microseconds. RWLock if reads dominate writes. Lock-free if you're willing to spend 10× the eng time for higher throughput. Recursive lock = a code smell signaling tangled control flow.

Lock	What it does	When to use
Mutex	Blocks waiters; OS-managed	Default for any non-trivial critical section
Spinlock	Busy-wait until acquired	Critical section < ~1μs and contention rare
RWLock	Many readers OR one writer	Read-heavy workload (caches, config). Hurts under writes.
Recursive lock	Same thread can re-acquire	Almost never — usually a sign the code is tangled
Lock-free	At least one thread always progresses	Hot data structures (queues, hashmaps) where contention is the bottleneck
Wait-free	Every thread progresses in bounded steps	Real-time / safety-critical. Rare due to complexity.

PITFALL — RWLock isn't always faster

RWLock has higher overhead than plain mutex (read-counter management). Under balanced or write-heavy workloads, plain mutex wins. Writer can starve under endless reads. Always benchmark before assuming RWLock helps.

REMEMBER

Default to mutex. Promote only when you have data showing it's the bottleneck.
Spinlock = bet that contention is rare AND critical sections are tiny.
Lock-free ≠ free. Wait-free is a strictly stronger guarantee almost no one needs.

CORRECTNESS · DEADLOCK

Coffman's four conditions — and how to break them

TL;DR

All four conditions must hold for deadlock to occur. Break any one to prevent it. The cheapest fix in production: global lock ordering — always acquire locks in the same canonical order across the codebase.

Mutual exclusion — only one thread holds the resource.
Hold and wait — a thread holds one resource while waiting for another.
No preemption — resources can't be forcibly released.
Circular wait — a cycle exists in the resource-wait graph.

Practical deadlock-prevention recipes:

Lock ordering: enforce a global order (e.g., always acquire account_a before account_b if id(a) < id(b)). Breaks circular wait.
Try-lock with backoff: if you can't get all needed locks, release what you have and retry. Breaks hold-and-wait.
Single coarse lock: when in doubt and not perf-critical, use one lock. Eliminates circular wait by construction.

REMEMBER

All four Coffman conditions required → break any one to prevent deadlock.
Global lock ordering is the cheapest fix that scales to large codebases.
Livelock (threads constantly retrying without progress) is harder to detect than deadlock.

CANONICAL PATTERN · CODE

Producer-consumer — the canonical pattern

TL;DR

Mutex + two condition variables (not_full, not_empty) + a queue + capacity. Always wait in a while loop (spurious wakeups). The canonical multi-thread interview drill — implement it cold in your sleep.

Python — using `queue.Queue`

import threading, queue

q = queue.Queue(maxsize=10)   # thread-safe, blocking on full/empty

def producer():
    for i in range(100):
        q.put(i)               # blocks if full
        print(f"produced {i}")

def consumer():
    while True:
        item = q.get()         # blocks if empty
        print(f"consumed {item}")
        q.task_done()

threading.Thread(target=producer, daemon=True).start()
threading.Thread(target=consumer, daemon=True).start()
q.join()

C++ — explicit condition variables

#include <mutex>
#include <condition_variable>
#include <queue>

template<typename T>
class BoundedQueue {
    std::mutex m;
    std::condition_variable not_full, not_empty;
    std::queue<T> q;
    size_t cap;
public:
    BoundedQueue(size_t c) : cap(c) {}

    void put(T x) {
        std::unique_lock lk(m);
        not_full.wait(lk, [&]{ return q.size() < cap; });
        q.push(std::move(x));
        not_empty.notify_one();
    }

    T take() {
        std::unique_lock lk(m);
        not_empty.wait(lk, [&]{ return !q.empty(); });
        T x = std::move(q.front()); q.pop();
        not_full.notify_one();
        return x;
    }
};

EXAMPLE — extending to multi-producer / multi-consumer

The above already works for N producers and M consumers — the mutex serializes access; condition variables wake exactly one waiter per notify. Use notify_all if your predicate change might unblock multiple waiters (e.g., adding K items at once instead of one).

PITFALL — the if vs while bug

Always wrap cv.wait in a while (!predicate()) loop, never if. Spurious wakeups are real (POSIX permits them). With if, your code wakes up in an invalid state and crashes.

REMEMBER

Mutex + 2 CVs + while-predicate-wait. That's the pattern.
notify_one when one waiter can make progress; notify_all when many might.
Python: just use queue.Queue unless you need custom predicates.

PYTHON DEEP DIVE · GIL

Python GIL — what it does, what it doesn't, free-threaded Python

TL;DR

The GIL serializes Python bytecode execution: only one thread runs Python at a time. It's released during I/O syscalls and in C extensions that ask for release (NumPy, PyTorch CUDA). It does NOT help CPU-bound pure Python — for that, use multiprocessing or free-threaded Python (3.13t, PEP 703).

THE INSIGHT — what the GIL actually blocks

It's not "Python isn't multi-threaded" — it's a specific lock with specific release rules

Released during I/O: read(), send(), sleep(), file I/O, network I/O — threading helps for I/O-bound work.
Released in C extensions that ask: NumPy, PyTorch CUDA, lots of native libraries — threading helps for those too.
Doesn't help CPU-bound pure Python: for i in range(N): x += 1 — single-thread serialized. Use multiprocessing.
Free-threaded Python (3.13t, 2024): experimental GIL-disabled build. ~10–30% single-thread overhead, but true parallel Python on multiple cores. Not yet default; library compatibility is growing fast (NumPy, PyTorch, Pandas already work).
Subinterpreters (3.12+): each interpreter has its own GIL; can run truly in parallel within one process. Communication via shared memory.

What's atomic in Python?

Single bytecode ops are atomic (e.g., list.append, dict[k] = v). But x += 1 is not — it's load + add + store, three separate bytecodes. Two threads can both load, both add, both store — losing one increment.

PITFALL — "Python is thread-safe so I don't need locks"

The GIL prevents corruption of internal CPython data structures (the C-level invariants). It does NOT prevent your Python-level race conditions. counter += 1 from N threads still loses updates without a lock.

REMEMBER

GIL = serializes bytecode, not the world.
Released during I/O syscalls and in cooperating C extensions.
For CPU-bound Python: multiprocessing or free-threaded 3.13t.
Read-modify-write at the Python level still races without a lock.

PYTHON DEEP DIVE · ASYNCIO

asyncio internals — the event loop and its sharp edges

TL;DR

Single-threaded event loop. await yields control back to the loop. asyncio.gather runs coroutines concurrently. The fatal failure mode: a single sync blocking call freezes the entire loop and every other coroutine.

The mental model

Tasks wrap coroutines; scheduled on the loop.
Futures are awaitables; resolved by external code.
Event loop uses selectors (epoll on Linux, kqueue on macOS) under the hood for I/O readiness.
Cancellation: task.cancel() raises CancelledError at the next await point.

import asyncio

async def fetch(url):
    print(f"start {url}")
    await asyncio.sleep(1)         # simulated I/O — yields control to loop
    print(f"done {url}")
    return url

async def main():
    results = await asyncio.gather(*(fetch(u) for u in urls))
    return results

asyncio.run(main())

CRITICAL — single sync call kills the loop

Why a sync call freezes every other coroutine

The loop is single-threaded. While it executes a sync function, no other coroutine runs. One time.sleep(1), sync DB query, or requests.get() freezes the entire loop for 1 second — every other concurrent task pauses too.

Fix: use async equivalents (await asyncio.sleep, async DB drivers like asyncpg, aiohttp instead of requests); or wrap blocking calls in asyncio.to_thread() / loop.run_in_executor().

REMEMBER

Event loop = single thread + selector (epoll/kqueue) + cooperative scheduling.
Every await is a yield point. Anything between two awaits runs uninterrupted.
Sync blocking calls inside async = catastrophic. Use to_thread() if you must.

PYTHON DEEP DIVE · TRUE PARALLELISM

Multiprocessing & shared memory — true parallelism in Python

TL;DR

Multiprocessing gives you actual parallelism in Python by spawning OS processes. The bottleneck is data transfer — pickle round-trips can dwarf the compute. Use multiprocessing.shared_memory for zero-copy NumPy sharing.

from multiprocessing import shared_memory
import numpy as np

# Producer
shm = shared_memory.SharedMemory(create=True, size=4 * 1024 * 1024)
arr = np.ndarray((1024, 1024), dtype=np.float32, buffer=shm.buf)
arr[:] = np.random.randn(1024, 1024)
print(shm.name)   # pass to consumer

# Consumer (different process)
shm = shared_memory.SharedMemory(name="...")
arr = np.ndarray((1024, 1024), dtype=np.float32, buffer=shm.buf)
# zero-copy access; no pickle round-trip

EXAMPLE — when multiprocessing wins

Pure-Python computation on independent chunks of data. Image preprocessing, simulations, hyperparam sweeps. Loses when data transfer dominates — e.g., transferring a 1GB array to children that do 10ms of work each.

REMEMBER

Multiprocessing = real parallel Python by spawning OS processes.
Pickle is the hidden cost. Use shared_memory for large arrays.
Consider Ray / Dask / joblib for production work — they handle the boilerplate.

LOCK-FREE · THE CORE PRIMITIVE

Compare-and-swap, ABA, lock-free patterns

TL;DR

CAS(addr, expected, new) atomically sets a value if it matches expected. The foundation of lock-free programming. The classic trap is the ABA problem — a value can change A → B → A while you're preempted, and your CAS will succeed on stale assumptions.

CAS — the building block

CAS(addr, expected, new): atomically set *addr = new if *addr == expected. Returns success/failure. Hardware provides this via CMPXCHG (x86) or LDREX/STREX (ARM). Foundation of lock-free queues, hashmaps, ref counts.

The ABA problem

Thread T1 reads A, gets preempted. Thread T2 changes A → B → A. T1 resumes, runs CAS expecting A — it succeeds, but the world changed underneath. T1's logic may now be invalid (e.g., the node it pointed to was freed and the address reused).

PITFALL — ABA bites lock-free queues hard

Michael-Scott queue without ABA protection: pop() reads head pointer, gets preempted; another thread pops + pushes a different node that happens to land at the same address; original pop's CAS succeeds and silently corrupts the queue.

Mitigations:

Tagged pointers: pointer + version counter (e.g., 48-bit ptr + 16-bit version). CAS the whole 64-bit word.
Hazard pointers: each thread publishes "I'm using ptr X." Memory not freed until no thread declares hazard.
RCU: defer freeing until all readers active before the write are done.
Garbage collection: in GC'd languages, the GC won't reuse addresses while references exist.

REMEMBER

CAS is the atomic primitive lock-free programming is built on.
ABA is the most common lock-free bug — assume it can happen unless you've defended against it.
Tagged pointers, hazard pointers, RCU, or GC each defeat ABA differently.

LOCK-FREE · ADVANCED RECLAMATION

RCU and hazard pointers — advanced reclamation

TL;DR

Lock-free needs a memory-reclamation strategy: when can old objects be freed if some thread might still be using them? RCU defers freeing until a "grace period" passes; hazard pointers track per-thread "in use" markers. Both prevent use-after-free in lock-free structures.

RCU (Read-Copy-Update)

Readers access shared data without locking. Writers create new copies and atomically swap pointers. Old copies freed only after all readers that started before the swap have finished (a "grace period"). Used heavily in the Linux kernel for read-heavy data (routing tables, file system mounts).

Hazard pointers

Per-thread pointers marking objects "in use." Memory reclamation defers freeing until no thread declares a hazard pointer to that object. Solves ABA and use-after-free in lock-free structures. More memory-efficient than RCU; more complex bookkeeping.

EXAMPLE — what real systems pick

Linux kernel: RCU heavily (read-heavy data structures). Folly (Facebook C++): hazard pointers (general-purpose lock-free). Java/Go: garbage collector handles it (no manual reclamation needed).

REMEMBER

RCU = grace period + atomic pointer swap. Cheap reads, expensive writes.
Hazard pointers = per-thread "in use" markers. More flexible, more complex.
GC'd languages get this for free — at the cost of pause times.

DRILLS · LEETCODE

Concurrency LeetCodes — the labs ask these

TL;DR

LeetCode has a small set of "concurrency" problems specifically designed to drill thread coordination patterns. Anthropic and OpenAI loops have specifically referenced these. Drill all 8.

REMEMBER

Drill 1188 (Bounded Blocking Queue) and 1242 (Web Crawler) until you can do them in your sleep.
The patterns repeat: semaphores, condition variables, atomic flags.

FIELD INTEL · WHAT COMPANIES PROBE

What companies actually probe

TL;DR

Concurrency questions cluster by company. Anthropic asks crawlers + thread-safe LRUs. OpenAI asks per-key locks. xAI says "simple and correct beats clever and broken." Stripe probes idempotency. Know the per-company patterns going in.

Company	What they probe
Anthropic	Multithreaded web crawler; thread-safe LRU; race conditions in shared cache; rate limiter under contention
OpenAI	Time-based KV store with per-key locks; multithreaded crawler; toy interpreter (lex/parse/exec)
xAI	"Simple and correct beats clever and broken" — implement thread-safe queue/cache; CUDA kernel correctness
Cohere	Vector-DB upsert race; concurrent batch inference
Stripe	Idempotency under concurrent retries; webhook signature validator
Nvidia	Producer-consumer ring buffer; parallel matrix multiplication

REMEMBER

Crawlers + thread-safe data structures dominate the canon.
If you're asked "lock-free" — clarify whether they want lock-free or "without obvious races" first.

0 → hero concurrency path

foundation OS: Three Easy Pieces — Part II (Concurrency). Free.
foundation Java Concurrency in Practice (Goetz) — even if you don't use Java; concepts transfer.
build Drill the LeetCode concurrency problems (1114-1242 above)
build Implement a thread-safe LRU in Python with a single lock; then with striped locks. Compare contention.
build Build a multithreaded web crawler in Python (the Anthropic question); then in C++ with proper memory ordering.
depth C++ Concurrency in Action (Williams) — Ch 5 on memory model is essential
depth Preshing on Programming — best memory-model essays on the web
depth Python GIL — RealPython
depth PEP 703 — free-threaded Python
depth Python asyncio docs + RealPython asyncio guide
depth CMU lock-free lecture
depth LWN — What every programmer should know about memory (Ulrich Drepper)

Concurrency quiz — readiness check

Implement a thread-safe LRU.
Show answer
Hashmap + doubly-linked list, single mutex around both. For higher concurrency: striped locks (per-bucket mutex). Java's old ConcurrentHashMap used this. Newer designs use lock-free with hazard pointers but they're production-grade complex.
What does Python's GIL block exactly?
Show answer
Bytecode execution. Released during I/O syscalls and in C extensions that explicitly release it (NumPy, PyTorch CUDA). Doesn't help CPU-bound pure Python — use multiprocessing or PEP 703 free-threaded Python (3.13t).
Diagnose a race in this code: counter += 1 from N threads.
Show answer
Not atomic — load + add + store. Two threads can both load N, both store N+1, lose one increment. Fix: lock around the increment, or use threading.Lock + with, or atomic counter (Python: itertools.count(); C++: std::atomic<int>).
Explain memory_order_acquire / release.
Show answer
Acquire load: prevents subsequent reads/writes from being reordered before this load. Release store: prevents prior reads/writes from being reordered after this store. Pair them to establish happens-before across threads (the SC-DRF model).
Producer-consumer with bounded buffer.
Show answer
Mutex + two condition variables (not_full, not_empty) + a queue + capacity. put: acquire mutex, wait on not_full while size==cap, push, signal not_empty. take: acquire mutex, wait on not_empty while empty, pop, signal not_full. Both predicate-tested via while loop (spurious wakeups).
What's the ABA problem?
Show answer
Thread reads value A; preempted; another thread changes A→B→A; first thread's CAS succeeds (sees A) but the world changed. Fix: tagged pointers (pointer + version counter), hazard pointers, RCU, garbage collection.
Why striped lock instead of single mutex on a hashmap?
Show answer
Lock per bucket → reduces contention. Threads operating on different keys touch different mutexes, parallelize. Tradeoff: more memory, more complex resize. Java's old ConcurrentHashMap used 16 stripes by default.
Implement a multithreaded web crawler.
Show answer
Thread pool of N workers. Shared thread-safe URL queue (deque + lock OR queue.Queue). Shared visited set (with lock). Per-host rate limiter. Termination: "no more URLs and no in-flight tasks." Use concurrent.futures.ThreadPoolExecutor in Python.
asyncio vs threading vs multiprocessing — pick one and explain.
Show answer
asyncio: many concurrent I/O operations, single thread, cooperative scheduling. Threading: I/O-bound or C-extension-heavy work where GIL is released. Multiprocessing: pure-CPU parallelism in Python. Free-threaded Python (3.13t) blurs the line.
Why does a single sync call kill an asyncio event loop?
Show answer
The loop is single-threaded; while it executes a sync function, no other coroutine runs. time.sleep(1), sync DB query, requests.get() — all freeze the loop. Use await asyncio.sleep, async DB drivers, aiohttp; or wrap blocking calls in asyncio.to_thread().
Coffman's four conditions — name them.
Show answer
(1) Mutual exclusion. (2) Hold and wait. (3) No preemption. (4) Circular wait. Break any one to prevent deadlock. Most common practical fix: lock ordering (always acquire in same global order).
RWLock — when does it help vs hurt?
Show answer
Helps: read-heavy workloads with rare writes (caches). Many readers proceed in parallel. Hurts: write-heavy or balanced — RWLock has higher overhead than plain mutex due to read-counter management; writer can starve. Test under workload before assuming RWLock.
Lock-free vs wait-free?
Show answer
Lock-free: at least one thread makes progress (system as a whole progresses). Wait-free: every thread makes progress in bounded steps (stronger). Most practical lock-free structures (Michael-Scott queue) are not wait-free; wait-free is rare due to complexity.
What is RCU?
Show answer
Read-Copy-Update. Readers access shared data without locking. Writers create new copies and atomically swap pointers. Old copies freed only after all readers that started before the swap have finished. Used in Linux kernel for read-heavy data (routing tables, etc.).
Why doesn't Python need volatile like Java/C++?
Show answer
Python's GIL serializes bytecode execution → all reads/writes are effectively atomic + visible at bytecode boundaries. Without GIL (free-threaded 3.13t), this guarantee weakens; PEP 703 introduces atomic primitives. Java's volatile ensures visibility across threads; C++ uses std::atomic.
What's the actor model?
Show answer
Concurrency primitive: actors are isolated entities with private state, communicate via async messages. No shared memory → no races. Used in Erlang, Akka. Tradeoffs: easier reasoning, but message-passing overhead and harder to handle some patterns (transactional updates).
Implement double-checked locking correctly.
Show answer
Classic singleton init. if (instance == null) { lock; if (instance == null) instance = new T(); unlock; }. Subtle bug in Java pre-5: instance write can be reordered before construction completes. Fix: declare instance volatile (Java) or use std::atomic with acquire/release (C++). Or just use static init (Java's class loader is thread-safe).
What does memory_order_relaxed guarantee?
Show answer
Atomicity only — no ordering with other operations. Use for counters where order doesn't matter. Reads/writes can be reordered by compiler / CPU. Faster than seq_cst on weak-memory architectures (ARM).
Why is std::shared_ptr thread-safe in some ways but not others?
Show answer
The reference count is atomic (thread-safe to copy/destroy shared_ptr from multiple threads). But the pointed-to object's contents are not protected — accessing the same object from multiple threads still needs a mutex. Common bug source.
What's the difference between condition variable's wait and a busy loop?
Show answer
Busy loop: thread spins, holding mutex or polling — wastes CPU. Condition variable: thread releases mutex, sleeps until notify, atomically reacquires mutex on wake. Always use a while loop around wait to handle spurious wakeups.

Concurrency & threads

What you'll learn

Threads vs processes vs coroutines — pick the right tool

When to reach for threads

When to reach for asyncio

Memory models — what the CPU and compiler don't promise

The four consistency points to know

C++ memory orders (memorize)

Locks — the menu and when each fits

Coffman's four conditions — and how to break them

Producer-consumer — the canonical pattern

Python — using queue.Queue

C++ — explicit condition variables

Python GIL — what it does, what it doesn't, free-threaded Python

It's not "Python isn't multi-threaded" — it's a specific lock with specific release rules

What's atomic in Python?

asyncio internals — the event loop and its sharp edges

The mental model

Why a sync call freezes every other coroutine

Multiprocessing & shared memory — true parallelism in Python

Compare-and-swap, ABA, lock-free patterns

CAS — the building block

The ABA problem

RCU and hazard pointers — advanced reclamation

RCU (Read-Copy-Update)

Hazard pointers

Concurrency LeetCodes — the labs ask these

What companies actually probe

0 → hero concurrency path

Concurrency quiz — readiness check

Python — using `queue.Queue`