Coding questions by company
Cross-referenced from 1Point3Acres, Glassdoor, Blind, LeetCode discuss, candidate Medium write-ups, Hello Interview, interviewing.io, LinkJob, Tech Interview, Sundeep Teki guide. Updated May 2026.
Anthropic
Format: Take-home / async OA on CodeSignal (90 min, sometimes 60 live), 5-7 day window. Then onsite. 4-level progressive spec — must pass all tests at level N before moving to N+1. Very small known bank (~6 problems); no benefit from memorizing standard DSA. Onsite ML round is open-ended ("ML configuration system") plus a separate concurrency-flavored coding round. Reference checks during the loop. ~20-day timeline.
The "six known" Anthropic problems
- Multithreaded web crawler — BFS from seed URL, same-domain filter, dedupe,
#fragmenthandling. First sync, then ThreadPoolExecutor. Follow-ups: threads vs processes, politeness/robots.txt, distributed crawling. - In-memory key-value store / LRU cache — 4 levels: SET/GET/DELETE → filtered scans → TTL with timestamps → file compression + restart durability.
- Stack-trace / sampling-profiler conversion — convert sampler stack snapshots to ordered start/end events. Handle recursion + identical consecutive stacks. Follow-ups: de-noising, N+ consecutive detection.
- Distributed mode/median across N nodes — find statistical mode then median across 10 nodes with 10 B/s read, 1 B/s send/recv constraints.
- Tokenize/detokenize round-trip — code review existing tokenizer, fix UNK handling, ensure invertibility.
- Bank transaction system — multiple transaction types, progressive complexity.
ML round
- 55-min Google Meet in Colab on prompting + LLM engineering.
- ML configuration system design — open-ended.
- Open-ended ML design discussion.
Concurrency round
Always asked. See concurrency page. Common: thread-safe queue, fix race in shared cache, design rate limiter under contention.
Sources
- LinkJob Anthropic Q-Bank 2026
- Anqi Silvia — Anthropic concurrency Qs (2025)
- 1p3acres Anthropic
- Sundeep Teki AI Lab Guide
OpenAI
Format: CoderPad. 60–75 min phone screen. 4–6 hr final loop in 1–2 days. Coding bar is binary — they reject for 2/4 or low 3/4 even if ML/research/behavioral nail it. Each problem is multi-part; substantially more code than FAANG; less algorithmic, more "production." Decentralized hiring; behavioral round is non-standard.
Reported problems (2024–2026)
- KV-store serialize/deserialize with arbitrary characters incl. delimiters (length-prefix encoding, Redis-style).
- Time-based KV store (LC #981 flavor). Extensions: per-key locks vs global, disk persistence.
- Spreadsheet API with
getCell/setCell, dependency graph, cycle detection via DFS, optimize getCell to O(1). - GPU credit management — time-based credits with FIFO expiry, add/expire/consume.
- Resumable iterator across nested structures with skip/reset.
- In-memory DB with SQL-like ops (CREATE/INSERT/SELECT WHERE, joins).
- Unix
cdwith symbolic-link resolution including cycle detection. - Multithreaded web crawler with rate limiting and dedup.
- Toy interpreter (75-min RE round) — lex + parse + execute simple language with arithmetic, vars, control flow.
ML / research themes
Deep transformer architecture knowledge. KV-cache. Attention complexity. Batch norm. Regularization. Debugging transformer training (NaNs, masks, padding, optimizer state).
Sources
Google DeepMind
Format: ~41-day average. PhD defense + rigorous engineering exam. AI tools prohibited in technical rounds (2026 policy). Undergraduate ML rapid-fire quiz that experienced PhDs frequently bomb (formal definitions of gradient flow, Bayesian inference, convergence proofs). Acceptance <1%.
Tracks
- Research Scientist: paper deep-dive (60 min) → research framing (60 min) → ML coding (60 min, unaided) implementing a piece of an ML pipeline (custom loss, attention, sampling routine, small training loop).
- Research Engineer: distributed training systems (Megatron/DeepSpeed/FSDP knowledge), eval harness design, general DSA.
- SWE: 2 medium-to-hard coding rounds (unaided), 1 system design, 1 domain depth.
Reported ML implementation tasks: custom loss from scratch, attention block, sampling/decoding code, small training loop in NumPy/PyTorch.
xAI (Grok)
Format: Engineer-led, very fast. Weight on written "exceptional work" statement. 20-min "hardest technical problem you've solved" presentation + Q&A defense. 45-60 min coding rounds, mix of algos and thread-safety/concurrency ("simple and correct beats clever and broken"). CodeSignal OA: 3 problems, 60 min.
Reported questions
- Implement a trie-based tokenizer
- Efficient attention for very long sequences (100K tokens) — design discussion
- Beam search with memory optimization
- CUDA kernel optimization for transformer inference
- Memory-efficient training algorithm for large models on limited GPU memory
- Distributed training system design for 100B+ param model
- Real-time inference system for Grok at 100K req/s
- Custom loss functions for LM pretraining
- Mixture-of-Experts architecture questions
- Concurrency-safe components (thread-safe queue/cache)
Cursor (Anysphere)
Format: TypeScript-first (Python OK for ML roles). AI tools permitted but interviewer judges your prompts. Sr/Staff: 4–8 hr take-home in their actual codebase. Heavy weight on craft round; fail mode is "passes coding but doesn't authentically use Cursor."
- Streaming Markdown parser in TypeScript (online test)
- Streaming edit application — apply incoming LLM tokens as real-time edits
- Multi-file diff/coordination tracking changes across many files
- Syntax-aware edit op
- File-tree diff modeling
- Build a hash tree in their codebase
Cohere
Format: Recruiter → HM (mostly projects) → virtual onsite: coding + ML concepts + system design + behavioral. Forward-deployed-platform-engineer flavor; expects you to debug a vector-DB upsert race condition rather than recite transformer math.
- Binary string reduction — count ops to reduce binary number to 0 (sub if odd / div by 2 if even)
- Streaming dedup — remove duplicate strings from stream without storing whole stream
- Longest unique substring (LC #3)
- System design: bit.ly URL shortener; real-time fraud detection with feature pipeline
- AI/ML: design knowledge-cutoff RAG mechanism; batch inference optimization (sub-batching with max-token + max-batch constraints, concurrent processing)
Mistral AI
Stages: LLM theory → coding → past project → tech manager → ML system design → take-home → values. Python coding round centers on refactoring rather than greenfield algorithms.
Perplexity AI
Format: Fast ~23-day loop, small bank, Python only.
- Probability of each number appearing in a stream
- Test if data stream is uniformly distributed (sample 3 random nums, verify uniform)
- Substring before first stop word, then streaming version with memory constraint
- Remove duplicate strings from stream → near-duplicates (case/punctuation/single-word diff)
- LLM Provider Pool —
Provider.query(prompt)+ProviderPoolwith fallback on failure across providers - CreditTracker class with add/subtract/check
- Implement beam search given function signature + unit tests
- Embedding model batch processing under max-batch + max-token constraints (concurrency)
- System design: personal-finance multi-account sync
- Kubernetes debugging — overloaded system, identify metrics
Cognition AI (Devin)
Heavy on customer-facing roleplay + pair programming + architecture deep-dive. Sample: "customer is angry that Devin deleted a critical file — walk through your response." "60-min onboarding workshop for senior engineers new to AI coding tools." Behavioral: "say no to a customer," "learn a new stack in days."
Sierra (Bret Taylor)
Replaced LeetCode with AI-native onsite: planning session (drive product ideation) + 2-hr building phase using AI tools/frameworks of choice. CoderPad multi-part, layered follow-ups; Python or TypeScript only. ~23-day loop.
Decagon
Recruiter → 60-min coding pair → 60-min system design → 60-min past project → behavioral.
- Implement n×n Tic-Tac-Toe game engine for two players
- Class that tracks conversation scores over a rolling time window
- LeetCode #84 (Largest Rectangle in Histogram)
- System design: AI agent system that resolves customer support tickets; knowledge ingestion + retrieval; eval framework for autonomous resolution quality
Apple (MLE / GenAI)
5–8 onsite rounds × 45–60 min. 1–2 coding rounds (LC medium-hard): rotate matrix 90°, merge K sorted lists, Mountain in array, LRU cache. Some rounds blend DSA + ML (e.g. implement cosine similarity, then DP extension). ML fundamentals: transformers, self-attention, RAG vs fine-tuning, embeddings, chunking.
Nvidia
ML system design + project depth. Coding includes stars pyramid, Maximum Product of Three Numbers (LC #628), producer-consumer ring buffer, parallel programming, matrix multiplication. ML: gradient descent, MLP/CNN/RNN/Transformer, Adam vs SGD, loss functions.
Waymo
CoderPad + Google Meet. Tech phone screen + ML design. Graph traversal, DP coding. ML: SGD variants, batch-size vs latency trade-offs, ML system design. 2 rounds in one day reported (1 coding + 1 ML design) Feb 2025.
Tesla Autopilot
HackerRank/Codility OA: medium algos in Python + simple ML tasks (feature engineering, eval). Onsite 3–5 rounds incl. ML system design for object detection pipeline (scalability, data pipeline, deployment).
Pinterest (MLE)
Tech phone screen: 3 ML fundamentals + 2 LC hard. Onsite: ML theory + LeetCode mediums. Topics: transformer arch, contrastive loss, learning-to-rank, vanishing gradients (which activations), recommender systems. Difficulty 3.4/5.
Snap (MLE)
1hr tech screen + 4hr onsite (2 LC rounds + ML system design + ML fundamentals). LC medium-hard. Sample: LC #4 Median of Two Sorted Arrays with O(log(m+n)) requirement. Task scheduling. Word dictionary. ML: explain attention, self-attention, cross-attention, multi-head attention; boosting; generative vs discriminative. ML infra design is differentiator.
Reddit (Senior MLE)
Recruiter → tech phone (build a model from provided data) → onsite (model design, feature engineering, applied ML in advertising). ML infra design as required round (feature stores, distributed training, online serving). Personalization/recsys focus.
Databricks (Staff MLE)
1hr CoderPad/Meet phone screen. LC medium-hard. ~20% pass rate. Onsite topics weighted toward graph algos, optimization, concurrency. LC tag composition: ~10 Easy / 19 Medium / 5 Hard; Array/HashTable heavy. Reference checks: 1 manager + 2 senior teammates, weighted heavily.
Snowflake
OA + 2hr tech phone screen (DSA + system design back-to-back). CoderPad. LC medium with hard follow-up. DP, BFS/DFS, binary search, linked lists; database internals twist. 30-min presentation interview on past project.
Notion (Backend SWE)
No LeetCode-style — practical problems only. Recently added AI-enabled interview requiring fluent use of Claude Code/Cursor.
Figma
3-5 rounds. LC medium DSA on coding round. System design × 2. Behavioral. Project. All coding questions "Figma-flavored."
Plaid
2 live coding questions on DS&A round. Backend/API/distributed-systems flavor. Real-world rather than pure LC.
Stripe
No LeetCode — practical, production-style problems on CoderPad. Sample: find pairs of transactions in 30s window summing to target; rate limiter for Stripe API; webhook signature validator with edge cases. ML coding round: dataset provided, build + evaluate model in 1 hour. System design: idempotency, exactly-once, distributed transactions, ledger flows, fraud, multi-currency. Integration round uses real Stripe API.
Airbnb (Staff MLE)
LC medium-hard, Airbnb-tagged. Classic DP problems (esp. on trees) reported. 4-5 stages. ML focus on dynamic pricing, trust signals, personalization.
Uber (Staff/MLE)
5-7 rounds over 4-8 weeks. 60/40 ML/coding split for MLE. Coding 1 (DSA) + Coding 2 (depth in specialization) for staff SWE. Recent 2025 SDE2 reports: DP on trees screening. ML system design: scalable pipelines, deployment, monitoring; offline vs online eval; A/B testing.
DoorDash (MLE)
60-min DSA round, 1 LC medium with 45 min for code+optimize. 22 reported problems, 6 Easy / 14 Medium / 2 Hard. Design HashMap, Longest Common Prefix, Jump Game. Graph traversal, scheduling, stateful services.
Anduril
4-6 onsite rounds. 2 LC easy + ML/CV concepts focus + ML design (harder). Heavy past-experience deep-dive. Perception roles need C++, PyTorch, TensorRT/ONNX deployment.
Cerebras
~26 day loop. 4 rounds. LC medium + parallel programming + matmul questions. Compiler engineer roles: MLIR, LLVM IR, parallelization/partitioning, novel program analysis.
Coinbase
OA: 2 math-heavy questions (DP + intervals/prefix sums). Onsite: 4 LC medium DSA in 90 min (target 3/4). Build something real (transaction management system, progressive requirements). System design: notifications with retries + queues.
Robinhood
OA: 4 questions (2 super easy + 1 hard + 1 medium). VO: 1 coding + 1 system design + 1 foundation.
AppLovin
5 rounds: 3 tech + 1 system design + 1 behavioral. LC medium, data-structure-design heavy. Classic: LRU cache O(1) then real-system extension (campaign config cache). Onsite: 2 questions starting easy with optimize follow-ups. ~3 weeks total.
Roblox
HackerRank OA + Roblox Assessment. 4 sections, ~2 hours. LC medium difficulty, often disguised in gaming/simulation context. 2 onsite tech rounds (Medium + Hard) + behavioral.
Crusoe / Lambda / Modal / Together / Baseten / MatX / Black Forest / World Labs / Physical Intelligence
Limited public data.
- Crusoe: 3 back-to-back (1 coding + 1 project review + 1 HM). Practical-oriented over LC.
- Lambda: SSH into EC2 + Linux/OOP probe + behavioral OOP (Python/Go). Take-homes for some roles.
- Modal: abstractions for distributed GPU workloads — system design heavy.
- World Labs / PI / BFL: expect VLA / world-model / 3D / SLAM / multimodal data-pipeline questions; PyTorch deployment; deep CV implementation.
Behavioral / culture themes per company
| Company | Themes |
|---|---|
| Anthropic | Safety, alignment, ethics. "Why AI safety?" Strong reference checks. |
| OpenAI | Mission alignment, agency, ambition, urgency. Non-standard behavioral. |
| DeepMind | Research depth, paper defense, scientific reasoning. |
| xAI | 20-min "hardest tech problem" talk. First principles. |
| Cursor | Authentic personal use of Cursor. Craft. Product taste. |
| Sierra | Practical AI agent reasoning. Product instinct. |
| Decagon | Engineering bar + applied LLM agent thinking. |
| Cognition | Customer empathy roleplay. Angry-customer simulations. |
| Stripe | "Code I'd approve in a PR." Correctness before perf. Failure modes. |
| Snap | "Kind, Smart, Creative" values. |
| Coinbase | Mission/crypto belief. "Build something real." Fundamentals over grinding. |
| Anduril | Mission alignment (defense). Past project deep dive (long). |
| Apple | Privacy. On-device thinking. Product polish. |
| Community/personalization stewardship. |
Key meta-takeaways
- Frontier labs have moved away from pure LeetCode. 60–90 min progressive specs (Anthropic 4-level CodeSignal, OpenAI multi-part CoderPad) are the prototype. Drill these, not Hard sweeps.
- "Implement multi-head attention from scratch in PyTorch" is the universal ML coding question (DeepMind, OpenAI, Anthropic, Mistral, Cohere, Perplexity, Snap, Apple). Drill until you can do it in 6 min cold.
- Concurrency is the rising differentiator. Anthropic, xAI, OpenAI all probe explicitly. Don't skip.
- AI tool policy varies wildly. DeepMind banned, Anthropic/OpenAI banned, Cursor/Sierra/Notion encouraged or required. Confirm per-company before the loop.
- Take-homes are back at Sr/Staff: Anthropic OA, Cursor 4–8hr, Mistral, most agent startups.
Sources (by company)
- Anthropic: LinkJob, Anqi Silvia
- OpenAI: Hello Interview, Anqi Silvia
- DeepMind: Tech Interview 2026
- xAI: LinkJob, Exponent Dec 2025
- Cursor: Tech Interview
- Cohere: LinkJob
- Perplexity: LinkJob
- Decagon: Tech Interview
- Sierra: Sierra blog
- Cognition: Dataford
- Apple: Pranalibose 2025
- Stripe: interviewing.io
- Pinterest: Glassdoor
- Snap: Interview Query
- 1Point3Acres aggregator: 1p3acres
- Yuan Meng's MLE Interview 2.0: Yuan Meng
- Sundeep Teki frontier-lab guide: Sundeep Teki