Coding questions by company

Cross-referenced from 1Point3Acres, Glassdoor, Blind, LeetCode discuss, candidate Medium write-ups, Hello Interview, interviewing.io, LinkJob, Tech Interview, Sundeep Teki guide. Updated May 2026.

Use these as study targets, not as memorization
Question banks at top labs rotate and explicitly punish memorization. The value here is knowing the style (multi-part, production-flavored, concurrency-heavy) so you prepare the right way.

Anthropic

Format: Take-home / async OA on CodeSignal (90 min, sometimes 60 live), 5-7 day window. Then onsite. 4-level progressive spec — must pass all tests at level N before moving to N+1. Very small known bank (~6 problems); no benefit from memorizing standard DSA. Onsite ML round is open-ended ("ML configuration system") plus a separate concurrency-flavored coding round. Reference checks during the loop. ~20-day timeline.

The "six known" Anthropic problems

  1. Multithreaded web crawler — BFS from seed URL, same-domain filter, dedupe, #fragment handling. First sync, then ThreadPoolExecutor. Follow-ups: threads vs processes, politeness/robots.txt, distributed crawling.
  2. In-memory key-value store / LRU cache — 4 levels: SET/GET/DELETE → filtered scans → TTL with timestamps → file compression + restart durability.
  3. Stack-trace / sampling-profiler conversion — convert sampler stack snapshots to ordered start/end events. Handle recursion + identical consecutive stacks. Follow-ups: de-noising, N+ consecutive detection.
  4. Distributed mode/median across N nodes — find statistical mode then median across 10 nodes with 10 B/s read, 1 B/s send/recv constraints.
  5. Tokenize/detokenize round-trip — code review existing tokenizer, fix UNK handling, ensure invertibility.
  6. Bank transaction system — multiple transaction types, progressive complexity.

ML round

Concurrency round

Always asked. See concurrency page. Common: thread-safe queue, fix race in shared cache, design rate limiter under contention.

Sources

OpenAI

Format: CoderPad. 60–75 min phone screen. 4–6 hr final loop in 1–2 days. Coding bar is binary — they reject for 2/4 or low 3/4 even if ML/research/behavioral nail it. Each problem is multi-part; substantially more code than FAANG; less algorithmic, more "production." Decentralized hiring; behavioral round is non-standard.

Reported problems (2024–2026)

  1. KV-store serialize/deserialize with arbitrary characters incl. delimiters (length-prefix encoding, Redis-style).
  2. Time-based KV store (LC #981 flavor). Extensions: per-key locks vs global, disk persistence.
  3. Spreadsheet API with getCell/setCell, dependency graph, cycle detection via DFS, optimize getCell to O(1).
  4. GPU credit management — time-based credits with FIFO expiry, add/expire/consume.
  5. Resumable iterator across nested structures with skip/reset.
  6. In-memory DB with SQL-like ops (CREATE/INSERT/SELECT WHERE, joins).
  7. Unix cd with symbolic-link resolution including cycle detection.
  8. Multithreaded web crawler with rate limiting and dedup.
  9. Toy interpreter (75-min RE round) — lex + parse + execute simple language with arithmetic, vars, control flow.

ML / research themes

Deep transformer architecture knowledge. KV-cache. Attention complexity. Batch norm. Regularization. Debugging transformer training (NaNs, masks, padding, optimizer state).

Sources

Google DeepMind

Format: ~41-day average. PhD defense + rigorous engineering exam. AI tools prohibited in technical rounds (2026 policy). Undergraduate ML rapid-fire quiz that experienced PhDs frequently bomb (formal definitions of gradient flow, Bayesian inference, convergence proofs). Acceptance <1%.

Tracks

Reported ML implementation tasks: custom loss from scratch, attention block, sampling/decoding code, small training loop in NumPy/PyTorch.

xAI (Grok)

Format: Engineer-led, very fast. Weight on written "exceptional work" statement. 20-min "hardest technical problem you've solved" presentation + Q&A defense. 45-60 min coding rounds, mix of algos and thread-safety/concurrency ("simple and correct beats clever and broken"). CodeSignal OA: 3 problems, 60 min.

Reported questions

Cursor (Anysphere)

Format: TypeScript-first (Python OK for ML roles). AI tools permitted but interviewer judges your prompts. Sr/Staff: 4–8 hr take-home in their actual codebase. Heavy weight on craft round; fail mode is "passes coding but doesn't authentically use Cursor."

Cohere

Format: Recruiter → HM (mostly projects) → virtual onsite: coding + ML concepts + system design + behavioral. Forward-deployed-platform-engineer flavor; expects you to debug a vector-DB upsert race condition rather than recite transformer math.

Mistral AI

Stages: LLM theory → coding → past project → tech manager → ML system design → take-home → values. Python coding round centers on refactoring rather than greenfield algorithms.

Perplexity AI

Format: Fast ~23-day loop, small bank, Python only.

  1. Probability of each number appearing in a stream
  2. Test if data stream is uniformly distributed (sample 3 random nums, verify uniform)
  3. Substring before first stop word, then streaming version with memory constraint
  4. Remove duplicate strings from stream → near-duplicates (case/punctuation/single-word diff)
  5. LLM Provider PoolProvider.query(prompt) + ProviderPool with fallback on failure across providers
  6. CreditTracker class with add/subtract/check
  7. Implement beam search given function signature + unit tests
  8. Embedding model batch processing under max-batch + max-token constraints (concurrency)
  9. System design: personal-finance multi-account sync
  10. Kubernetes debugging — overloaded system, identify metrics

Cognition AI (Devin)

Heavy on customer-facing roleplay + pair programming + architecture deep-dive. Sample: "customer is angry that Devin deleted a critical file — walk through your response." "60-min onboarding workshop for senior engineers new to AI coding tools." Behavioral: "say no to a customer," "learn a new stack in days."

Sierra (Bret Taylor)

Replaced LeetCode with AI-native onsite: planning session (drive product ideation) + 2-hr building phase using AI tools/frameworks of choice. CoderPad multi-part, layered follow-ups; Python or TypeScript only. ~23-day loop.

Decagon

Recruiter → 60-min coding pair → 60-min system design → 60-min past project → behavioral.

Apple (MLE / GenAI)

5–8 onsite rounds × 45–60 min. 1–2 coding rounds (LC medium-hard): rotate matrix 90°, merge K sorted lists, Mountain in array, LRU cache. Some rounds blend DSA + ML (e.g. implement cosine similarity, then DP extension). ML fundamentals: transformers, self-attention, RAG vs fine-tuning, embeddings, chunking.

Nvidia

ML system design + project depth. Coding includes stars pyramid, Maximum Product of Three Numbers (LC #628), producer-consumer ring buffer, parallel programming, matrix multiplication. ML: gradient descent, MLP/CNN/RNN/Transformer, Adam vs SGD, loss functions.

Waymo

CoderPad + Google Meet. Tech phone screen + ML design. Graph traversal, DP coding. ML: SGD variants, batch-size vs latency trade-offs, ML system design. 2 rounds in one day reported (1 coding + 1 ML design) Feb 2025.

Tesla Autopilot

HackerRank/Codility OA: medium algos in Python + simple ML tasks (feature engineering, eval). Onsite 3–5 rounds incl. ML system design for object detection pipeline (scalability, data pipeline, deployment).

Pinterest (MLE)

Tech phone screen: 3 ML fundamentals + 2 LC hard. Onsite: ML theory + LeetCode mediums. Topics: transformer arch, contrastive loss, learning-to-rank, vanishing gradients (which activations), recommender systems. Difficulty 3.4/5.

Snap (MLE)

1hr tech screen + 4hr onsite (2 LC rounds + ML system design + ML fundamentals). LC medium-hard. Sample: LC #4 Median of Two Sorted Arrays with O(log(m+n)) requirement. Task scheduling. Word dictionary. ML: explain attention, self-attention, cross-attention, multi-head attention; boosting; generative vs discriminative. ML infra design is differentiator.

Reddit (Senior MLE)

Recruiter → tech phone (build a model from provided data) → onsite (model design, feature engineering, applied ML in advertising). ML infra design as required round (feature stores, distributed training, online serving). Personalization/recsys focus.

Databricks (Staff MLE)

1hr CoderPad/Meet phone screen. LC medium-hard. ~20% pass rate. Onsite topics weighted toward graph algos, optimization, concurrency. LC tag composition: ~10 Easy / 19 Medium / 5 Hard; Array/HashTable heavy. Reference checks: 1 manager + 2 senior teammates, weighted heavily.

Snowflake

OA + 2hr tech phone screen (DSA + system design back-to-back). CoderPad. LC medium with hard follow-up. DP, BFS/DFS, binary search, linked lists; database internals twist. 30-min presentation interview on past project.

Notion (Backend SWE)

No LeetCode-style — practical problems only. Recently added AI-enabled interview requiring fluent use of Claude Code/Cursor.

Figma

3-5 rounds. LC medium DSA on coding round. System design × 2. Behavioral. Project. All coding questions "Figma-flavored."

Plaid

2 live coding questions on DS&A round. Backend/API/distributed-systems flavor. Real-world rather than pure LC.

Stripe

No LeetCode — practical, production-style problems on CoderPad. Sample: find pairs of transactions in 30s window summing to target; rate limiter for Stripe API; webhook signature validator with edge cases. ML coding round: dataset provided, build + evaluate model in 1 hour. System design: idempotency, exactly-once, distributed transactions, ledger flows, fraud, multi-currency. Integration round uses real Stripe API.

Airbnb (Staff MLE)

LC medium-hard, Airbnb-tagged. Classic DP problems (esp. on trees) reported. 4-5 stages. ML focus on dynamic pricing, trust signals, personalization.

Uber (Staff/MLE)

5-7 rounds over 4-8 weeks. 60/40 ML/coding split for MLE. Coding 1 (DSA) + Coding 2 (depth in specialization) for staff SWE. Recent 2025 SDE2 reports: DP on trees screening. ML system design: scalable pipelines, deployment, monitoring; offline vs online eval; A/B testing.

DoorDash (MLE)

60-min DSA round, 1 LC medium with 45 min for code+optimize. 22 reported problems, 6 Easy / 14 Medium / 2 Hard. Design HashMap, Longest Common Prefix, Jump Game. Graph traversal, scheduling, stateful services.

Anduril

4-6 onsite rounds. 2 LC easy + ML/CV concepts focus + ML design (harder). Heavy past-experience deep-dive. Perception roles need C++, PyTorch, TensorRT/ONNX deployment.

Cerebras

~26 day loop. 4 rounds. LC medium + parallel programming + matmul questions. Compiler engineer roles: MLIR, LLVM IR, parallelization/partitioning, novel program analysis.

Coinbase

OA: 2 math-heavy questions (DP + intervals/prefix sums). Onsite: 4 LC medium DSA in 90 min (target 3/4). Build something real (transaction management system, progressive requirements). System design: notifications with retries + queues.

Robinhood

OA: 4 questions (2 super easy + 1 hard + 1 medium). VO: 1 coding + 1 system design + 1 foundation.

AppLovin

5 rounds: 3 tech + 1 system design + 1 behavioral. LC medium, data-structure-design heavy. Classic: LRU cache O(1) then real-system extension (campaign config cache). Onsite: 2 questions starting easy with optimize follow-ups. ~3 weeks total.

Roblox

HackerRank OA + Roblox Assessment. 4 sections, ~2 hours. LC medium difficulty, often disguised in gaming/simulation context. 2 onsite tech rounds (Medium + Hard) + behavioral.

Crusoe / Lambda / Modal / Together / Baseten / MatX / Black Forest / World Labs / Physical Intelligence

Limited public data.

Behavioral / culture themes per company

CompanyThemes
AnthropicSafety, alignment, ethics. "Why AI safety?" Strong reference checks.
OpenAIMission alignment, agency, ambition, urgency. Non-standard behavioral.
DeepMindResearch depth, paper defense, scientific reasoning.
xAI20-min "hardest tech problem" talk. First principles.
CursorAuthentic personal use of Cursor. Craft. Product taste.
SierraPractical AI agent reasoning. Product instinct.
DecagonEngineering bar + applied LLM agent thinking.
CognitionCustomer empathy roleplay. Angry-customer simulations.
Stripe"Code I'd approve in a PR." Correctness before perf. Failure modes.
Snap"Kind, Smart, Creative" values.
CoinbaseMission/crypto belief. "Build something real." Fundamentals over grinding.
AndurilMission alignment (defense). Past project deep dive (long).
ApplePrivacy. On-device thinking. Product polish.
RedditCommunity/personalization stewardship.

Key meta-takeaways

  1. Frontier labs have moved away from pure LeetCode. 60–90 min progressive specs (Anthropic 4-level CodeSignal, OpenAI multi-part CoderPad) are the prototype. Drill these, not Hard sweeps.
  2. "Implement multi-head attention from scratch in PyTorch" is the universal ML coding question (DeepMind, OpenAI, Anthropic, Mistral, Cohere, Perplexity, Snap, Apple). Drill until you can do it in 6 min cold.
  3. Concurrency is the rising differentiator. Anthropic, xAI, OpenAI all probe explicitly. Don't skip.
  4. AI tool policy varies wildly. DeepMind banned, Anthropic/OpenAI banned, Cursor/Sierra/Notion encouraged or required. Confirm per-company before the loop.
  5. Take-homes are back at Sr/Staff: Anthropic OA, Cursor 4–8hr, Mistral, most agent startups.

Sources (by company)