Coding questions by company

Cross-referenced from 1Point3Acres, Glassdoor, Blind, LeetCode discuss, candidate Medium write-ups, Hello Interview, interviewing.io, LinkJob, Tech Interview, Sundeep Teki guide. Updated May 2026.

Use these as study targets, not as memorization

Question banks at top labs rotate and explicitly punish memorization. The value here is knowing the style (multi-part, production-flavored, concurrency-heavy) so you prepare the right way.

Anthropic

Format: Take-home / async OA on CodeSignal (90 min, sometimes 60 live), 5-7 day window. Then onsite. 4-level progressive spec — must pass all tests at level N before moving to N+1. Very small known bank (~6 problems); no benefit from memorizing standard DSA. Onsite ML round is open-ended ("ML configuration system") plus a separate concurrency-flavored coding round. Reference checks during the loop. ~20-day timeline.

The "six known" Anthropic problems

Multithreaded web crawler — BFS from seed URL, same-domain filter, dedupe, #fragment handling. First sync, then ThreadPoolExecutor. Follow-ups: threads vs processes, politeness/robots.txt, distributed crawling.
In-memory key-value store / LRU cache — 4 levels: SET/GET/DELETE → filtered scans → TTL with timestamps → file compression + restart durability.
Stack-trace / sampling-profiler conversion — convert sampler stack snapshots to ordered start/end events. Handle recursion + identical consecutive stacks. Follow-ups: de-noising, N+ consecutive detection.
Distributed mode/median across N nodes — find statistical mode then median across 10 nodes with 10 B/s read, 1 B/s send/recv constraints.
Tokenize/detokenize round-trip — code review existing tokenizer, fix UNK handling, ensure invertibility.
Bank transaction system — multiple transaction types, progressive complexity.

ML round

55-min Google Meet in Colab on prompting + LLM engineering.
ML configuration system design — open-ended.
Open-ended ML design discussion.

Concurrency round

Always asked. See concurrency page. Common: thread-safe queue, fix race in shared cache, design rate limiter under contention.

Sources

OpenAI

Format: CoderPad. 60–75 min phone screen. 4–6 hr final loop in 1–2 days. Coding bar is binary — they reject for 2/4 or low 3/4 even if ML/research/behavioral nail it. Each problem is multi-part; substantially more code than FAANG; less algorithmic, more "production." Decentralized hiring; behavioral round is non-standard.

Reported problems (2024–2026)

KV-store serialize/deserialize with arbitrary characters incl. delimiters (length-prefix encoding, Redis-style).
Time-based KV store (LC #981 flavor). Extensions: per-key locks vs global, disk persistence.
Spreadsheet API with getCell/setCell, dependency graph, cycle detection via DFS, optimize getCell to O(1).
GPU credit management — time-based credits with FIFO expiry, add/expire/consume.
Resumable iterator across nested structures with skip/reset.
In-memory DB with SQL-like ops (CREATE/INSERT/SELECT WHERE, joins).
Unix cd with symbolic-link resolution including cycle detection.
Multithreaded web crawler with rate limiting and dedup.
Toy interpreter (75-min RE round) — lex + parse + execute simple language with arithmetic, vars, control flow.

ML / research themes

Deep transformer architecture knowledge. KV-cache. Attention complexity. Batch norm. Regularization. Debugging transformer training (NaNs, masks, padding, optimizer state).

Sources

Google DeepMind

Format: ~41-day average. PhD defense + rigorous engineering exam. AI tools prohibited in technical rounds (2026 policy). Undergraduate ML rapid-fire quiz that experienced PhDs frequently bomb (formal definitions of gradient flow, Bayesian inference, convergence proofs). Acceptance <1%.

Tracks

Research Scientist: paper deep-dive (60 min) → research framing (60 min) → ML coding (60 min, unaided) implementing a piece of an ML pipeline (custom loss, attention, sampling routine, small training loop).
Research Engineer: distributed training systems (Megatron/DeepSpeed/FSDP knowledge), eval harness design, general DSA.
SWE: 2 medium-to-hard coding rounds (unaided), 1 system design, 1 domain depth.

Reported ML implementation tasks: custom loss from scratch, attention block, sampling/decoding code, small training loop in NumPy/PyTorch.

xAI (Grok)

Format: Engineer-led, very fast. Weight on written "exceptional work" statement. 20-min "hardest technical problem you've solved" presentation + Q&A defense. 45-60 min coding rounds, mix of algos and thread-safety/concurrency ("simple and correct beats clever and broken"). CodeSignal OA: 3 problems, 60 min.

Reported questions

Implement a trie-based tokenizer
Efficient attention for very long sequences (100K tokens) — design discussion
Beam search with memory optimization
CUDA kernel optimization for transformer inference
Memory-efficient training algorithm for large models on limited GPU memory
Distributed training system design for 100B+ param model
Real-time inference system for Grok at 100K req/s
Custom loss functions for LM pretraining
Mixture-of-Experts architecture questions
Concurrency-safe components (thread-safe queue/cache)

Cursor (Anysphere)

Format: TypeScript-first (Python OK for ML roles). AI tools permitted but interviewer judges your prompts. Sr/Staff: 4–8 hr take-home in their actual codebase. Heavy weight on craft round; fail mode is "passes coding but doesn't authentically use Cursor."

Streaming Markdown parser in TypeScript (online test)
Streaming edit application — apply incoming LLM tokens as real-time edits
Multi-file diff/coordination tracking changes across many files
Syntax-aware edit op
File-tree diff modeling
Build a hash tree in their codebase

Cohere

Format: Recruiter → HM (mostly projects) → virtual onsite: coding + ML concepts + system design + behavioral. Forward-deployed-platform-engineer flavor; expects you to debug a vector-DB upsert race condition rather than recite transformer math.

Binary string reduction — count ops to reduce binary number to 0 (sub if odd / div by 2 if even)
Streaming dedup — remove duplicate strings from stream without storing whole stream
Longest unique substring (LC #3)
System design: bit.ly URL shortener; real-time fraud detection with feature pipeline
AI/ML: design knowledge-cutoff RAG mechanism; batch inference optimization (sub-batching with max-token + max-batch constraints, concurrent processing)

Mistral AI

Stages: LLM theory → coding → past project → tech manager → ML system design → take-home → values. Python coding round centers on refactoring rather than greenfield algorithms.

Perplexity AI

Format: Fast ~23-day loop, small bank, Python only.

Probability of each number appearing in a stream
Test if data stream is uniformly distributed (sample 3 random nums, verify uniform)
Substring before first stop word, then streaming version with memory constraint
Remove duplicate strings from stream → near-duplicates (case/punctuation/single-word diff)
LLM Provider Pool — Provider.query(prompt) + ProviderPool with fallback on failure across providers
CreditTracker class with add/subtract/check
Implement beam search given function signature + unit tests
Embedding model batch processing under max-batch + max-token constraints (concurrency)
System design: personal-finance multi-account sync
Kubernetes debugging — overloaded system, identify metrics

Cognition AI (Devin)

Heavy on customer-facing roleplay + pair programming + architecture deep-dive. Sample: "customer is angry that Devin deleted a critical file — walk through your response." "60-min onboarding workshop for senior engineers new to AI coding tools." Behavioral: "say no to a customer," "learn a new stack in days."

Sierra (Bret Taylor)

Replaced LeetCode with AI-native onsite: planning session (drive product ideation) + 2-hr building phase using AI tools/frameworks of choice. CoderPad multi-part, layered follow-ups; Python or TypeScript only. ~23-day loop.

Decagon

Recruiter → 60-min coding pair → 60-min system design → 60-min past project → behavioral.

Implement n×n Tic-Tac-Toe game engine for two players
Class that tracks conversation scores over a rolling time window
LeetCode #84 (Largest Rectangle in Histogram)
System design: AI agent system that resolves customer support tickets; knowledge ingestion + retrieval; eval framework for autonomous resolution quality

Apple (MLE / GenAI)

5–8 onsite rounds × 45–60 min. 1–2 coding rounds (LC medium-hard): rotate matrix 90°, merge K sorted lists, Mountain in array, LRU cache. Some rounds blend DSA + ML (e.g. implement cosine similarity, then DP extension). ML fundamentals: transformers, self-attention, RAG vs fine-tuning, embeddings, chunking.

Nvidia

ML system design + project depth. Coding includes stars pyramid, Maximum Product of Three Numbers (LC #628), producer-consumer ring buffer, parallel programming, matrix multiplication. ML: gradient descent, MLP/CNN/RNN/Transformer, Adam vs SGD, loss functions.

Waymo

CoderPad + Google Meet. Tech phone screen + ML design. Graph traversal, DP coding. ML: SGD variants, batch-size vs latency trade-offs, ML system design. 2 rounds in one day reported (1 coding + 1 ML design) Feb 2025.

Tesla Autopilot

HackerRank/Codility OA: medium algos in Python + simple ML tasks (feature engineering, eval). Onsite 3–5 rounds incl. ML system design for object detection pipeline (scalability, data pipeline, deployment).

Pinterest (MLE)

Tech phone screen: 3 ML fundamentals + 2 LC hard. Onsite: ML theory + LeetCode mediums. Topics: transformer arch, contrastive loss, learning-to-rank, vanishing gradients (which activations), recommender systems. Difficulty 3.4/5.

Snap (MLE)

1hr tech screen + 4hr onsite (2 LC rounds + ML system design + ML fundamentals). LC medium-hard. Sample: LC #4 Median of Two Sorted Arrays with O(log(m+n)) requirement. Task scheduling. Word dictionary. ML: explain attention, self-attention, cross-attention, multi-head attention; boosting; generative vs discriminative. ML infra design is differentiator.

Reddit (Senior MLE)

Recruiter → tech phone (build a model from provided data) → onsite (model design, feature engineering, applied ML in advertising). ML infra design as required round (feature stores, distributed training, online serving). Personalization/recsys focus.

Databricks (Staff MLE)

1hr CoderPad/Meet phone screen. LC medium-hard. ~20% pass rate. Onsite topics weighted toward graph algos, optimization, concurrency. LC tag composition: ~10 Easy / 19 Medium / 5 Hard; Array/HashTable heavy. Reference checks: 1 manager + 2 senior teammates, weighted heavily.

Snowflake

OA + 2hr tech phone screen (DSA + system design back-to-back). CoderPad. LC medium with hard follow-up. DP, BFS/DFS, binary search, linked lists; database internals twist. 30-min presentation interview on past project.

Notion (Backend SWE)

No LeetCode-style — practical problems only. Recently added AI-enabled interview requiring fluent use of Claude Code/Cursor.

Figma

3-5 rounds. LC medium DSA on coding round. System design × 2. Behavioral. Project. All coding questions "Figma-flavored."

Plaid

2 live coding questions on DS&A round. Backend/API/distributed-systems flavor. Real-world rather than pure LC.

Stripe

No LeetCode — practical, production-style problems on CoderPad. Sample: find pairs of transactions in 30s window summing to target; rate limiter for Stripe API; webhook signature validator with edge cases. ML coding round: dataset provided, build + evaluate model in 1 hour. System design: idempotency, exactly-once, distributed transactions, ledger flows, fraud, multi-currency. Integration round uses real Stripe API.

Airbnb (Staff MLE)

LC medium-hard, Airbnb-tagged. Classic DP problems (esp. on trees) reported. 4-5 stages. ML focus on dynamic pricing, trust signals, personalization.

Uber (Staff/MLE)

5-7 rounds over 4-8 weeks. 60/40 ML/coding split for MLE. Coding 1 (DSA) + Coding 2 (depth in specialization) for staff SWE. Recent 2025 SDE2 reports: DP on trees screening. ML system design: scalable pipelines, deployment, monitoring; offline vs online eval; A/B testing.

DoorDash (MLE)

60-min DSA round, 1 LC medium with 45 min for code+optimize. 22 reported problems, 6 Easy / 14 Medium / 2 Hard. Design HashMap, Longest Common Prefix, Jump Game. Graph traversal, scheduling, stateful services.

Anduril

4-6 onsite rounds. 2 LC easy + ML/CV concepts focus + ML design (harder). Heavy past-experience deep-dive. Perception roles need C++, PyTorch, TensorRT/ONNX deployment.

Cerebras

~26 day loop. 4 rounds. LC medium + parallel programming + matmul questions. Compiler engineer roles: MLIR, LLVM IR, parallelization/partitioning, novel program analysis.

Coinbase

OA: 2 math-heavy questions (DP + intervals/prefix sums). Onsite: 4 LC medium DSA in 90 min (target 3/4). Build something real (transaction management system, progressive requirements). System design: notifications with retries + queues.

Robinhood

OA: 4 questions (2 super easy + 1 hard + 1 medium). VO: 1 coding + 1 system design + 1 foundation.

AppLovin

5 rounds: 3 tech + 1 system design + 1 behavioral. LC medium, data-structure-design heavy. Classic: LRU cache O(1) then real-system extension (campaign config cache). Onsite: 2 questions starting easy with optimize follow-ups. ~3 weeks total.

Roblox

HackerRank OA + Roblox Assessment. 4 sections, ~2 hours. LC medium difficulty, often disguised in gaming/simulation context. 2 onsite tech rounds (Medium + Hard) + behavioral.

Crusoe / Lambda / Modal / Together / Baseten / MatX / Black Forest / World Labs / Physical Intelligence

Limited public data.

Crusoe: 3 back-to-back (1 coding + 1 project review + 1 HM). Practical-oriented over LC.
Lambda: SSH into EC2 + Linux/OOP probe + behavioral OOP (Python/Go). Take-homes for some roles.
Modal: abstractions for distributed GPU workloads — system design heavy.
World Labs / PI / BFL: expect VLA / world-model / 3D / SLAM / multimodal data-pipeline questions; PyTorch deployment; deep CV implementation.

Behavioral / culture themes per company

Company	Themes
Anthropic	Safety, alignment, ethics. "Why AI safety?" Strong reference checks.
OpenAI	Mission alignment, agency, ambition, urgency. Non-standard behavioral.
DeepMind	Research depth, paper defense, scientific reasoning.
xAI	20-min "hardest tech problem" talk. First principles.
Cursor	Authentic personal use of Cursor. Craft. Product taste.
Sierra	Practical AI agent reasoning. Product instinct.
Decagon	Engineering bar + applied LLM agent thinking.
Cognition	Customer empathy roleplay. Angry-customer simulations.
Stripe	"Code I'd approve in a PR." Correctness before perf. Failure modes.
Snap	"Kind, Smart, Creative" values.
Coinbase	Mission/crypto belief. "Build something real." Fundamentals over grinding.
Anduril	Mission alignment (defense). Past project deep dive (long).
Apple	Privacy. On-device thinking. Product polish.
Reddit	Community/personalization stewardship.

Key meta-takeaways

Frontier labs have moved away from pure LeetCode. 60–90 min progressive specs (Anthropic 4-level CodeSignal, OpenAI multi-part CoderPad) are the prototype. Drill these, not Hard sweeps.
"Implement multi-head attention from scratch in PyTorch" is the universal ML coding question (DeepMind, OpenAI, Anthropic, Mistral, Cohere, Perplexity, Snap, Apple). Drill until you can do it in 6 min cold.
Concurrency is the rising differentiator. Anthropic, xAI, OpenAI all probe explicitly. Don't skip.
AI tool policy varies wildly. DeepMind banned, Anthropic/OpenAI banned, Cursor/Sierra/Notion encouraged or required. Confirm per-company before the loop.
Take-homes are back at Sr/Staff: Anthropic OA, Cursor 4–8hr, Mistral, most agent startups.

Sources (by company)

Anthropic: LinkJob, Anqi Silvia
OpenAI: Hello Interview, Anqi Silvia
DeepMind: Tech Interview 2026
xAI: LinkJob, Exponent Dec 2025
Cursor: Tech Interview
Cohere: LinkJob
Perplexity: LinkJob
Decagon: Tech Interview
Sierra: Sierra blog
Cognition: Dataford
Apple: Pranalibose 2025
Stripe: interviewing.io
Pinterest: Glassdoor
Snap: Interview Query
1Point3Acres aggregator: 1p3acres
Yuan Meng's MLE Interview 2.0: Yuan Meng
Sundeep Teki frontier-lab guide: Sundeep Teki