Your offline cockpit for quant interviews: cheatsheets + simulation lab + flashcards for learning + experimenting + drilling. Customize your subject tiles, pin your weak points, and add your own flashcards and cheatsheets.
Your collection. Hit PIN on any cheatsheet card or flashcard below to pin a copy here; unpin to remove. Pins persist in this browser.
Jump straight to any cheatsheet topic (read) or flashcard domain (drill). Counts update as the library grows.
Didactic, example-filled references spanning every quant subject. Pick a topic to switch.
P(A|B)=P(A∩B)/P(B), defined for P(B)>0. Read it as: renormalize the sample space to B.P(A∩B∩C)=P(A)·P(B|A)·P(C|A∩B) — the workhorse for sequential draws.P(A|B)=P(A) ⇔ P(A∩B)=P(A)P(B). Pairwise independence does NOT imply mutual independence.P(A|B)≠P(B|A) in general (prosecutor's fallacy).(4/52)·(3/51)=1/221 via the chain rule.{Bᵢ} of the sample space: P(A)=Σᵢ P(A|Bᵢ)P(Bᵢ). Average the conditionals, weighted by prior.P(Bⱼ|A)=P(A|Bⱼ)P(Bⱼ)/Σᵢ P(A|Bᵢ)P(Bᵢ).posterior odds = prior odds × likelihood ratio.(1·½)/(½·½+1·½)=2/3 by odds: prior 1:1 × LR (1/½)=2:1.E[aX+bY]=aE[X]+bE[Y] ALWAYS — no independence needed. This breaks most hard problems.E[g(X)]=Σ g(x)p(x) or ∫ g(x)f(x)dx — don't derive the dist of g(X).E[X]=Σₖ₌₁ P(X≥k); continuous: E[X]=∫₀^∞ P(X>x)dx.E[X]=E[E[X|Y]].2/(2n−1), so E=2n/(2n−1).X=Σ Iₐ where Iₐ∈{0,1}; then E[X]=Σ P(Aₐ) by linearity.E[Iₐ]=P(Aₐ) and Iₐ²=Iₐ.Var(X)=Σ Var(Iₐ)+ΣΣ_{a≠b} Cov(Iₐ,I_b); Cov(Iₐ,I_b)=P(Aₐ∩A_b)−P(Aₐ)P(A_b).1/i. Expected records =Σᵢ₌₁ⁿ 1/i=Hₙ≈ln n+γ.Var(X)=E[X²]−(E[X])²; Cov(X,Y)=E[XY]−E[X]E[Y].Var(aX+bY)=a²Var(X)+b²Var(Y)+2ab·Cov(X,Y). Cross term vanishes only if uncorrelated.ρ=Cov(X,Y)/(σ_Xσ_Y)∈[−1,1]; measures LINEAR dependence only.nσ²; of the mean: σ²/n.E[X]=E[E[X|Y]] — condition on whatever simplifies the problem.Var(X)=E[Var(X|Y)]+Var(E[X|Y]) = within-group + between-group variance.E[X|Y] is a random variable (a function of Y); plug in a value to get a number.E=1+½·0+½·E ⇒ E=2. (Mean of Geometric(p) is 1/p.)P(X≥a)≤E[X]/a. Only the mean; loose but assumption-free.P(|X−μ|≥kσ)≤1/k² — needs finite variance, distribution-free.σ²/n→0.E[φ(X)]≥φ(E[X]). Concave: inequality flips.E[X²]≥(E[X])² (so Var≥0), E[1/X]≥1/E[X] for X>0, E[ln X]≤ln E[X].P(A before B)=½; the k-th of n shuffled items is uniform over positions.=½ — no counting.Cov(X,Y)=E[XY]−E[X]E[Y]; ρ = Cov/(σ_X σ_Y) ∈ [−1,1].
Var(aX+bY) = a²Var(X)+b²Var(Y)+2ab·Cov(X,Y).Var(X)=E[Var(X|Y)]+Var(E[X|Y]).E[X]=E[E[X|Y]].E[Y|X=x]=μ_Y+ρ(σ_Y/σ_X)(x−μ_X).θ̂ is a function of the sample; it has its own sampling distribution.MSE(θ̂)=E[(θ̂−θ)²]=(E[θ̂]−θ)²+Var(θ̂). The whole tradeoff lives here.E[θ̂]=θ for all n. Consistent: θ̂→θ in probability as n→∞. Different properties — neither implies the other.Var(θ̂)≥1/I(θ) for unbiased estimators (I=Fisher info).θ̂≡5 (ignore data) is super low-variance but biased and inconsistent. Sample mean X̄ is unbiased and consistent: Var(X̄)=σ²/n→0.s²=Σ(xᵢ−x̄)²/(n−1) is the unbiased estimator of σ²; dividing by n underestimates.n−1 deviations are free — they must sum to 0 (one degree of freedom is spent estimating the mean).s (the std, not s²) is still slightly biased by Jensen — E[s]<σ — because √ is concave. Unbiasedness doesn't survive nonlinear transforms.~1/n, so it's negligible for large samples and matters mainly for small n.x̄=3, Σ(xᵢ−x̄)²=2. Biased: 2/2=1. Unbiased: 2/1=2 — the true spread is better captured by 2.θ̂=argmax L(θ)=argmax Πf(xᵢ;θ). Maximize the log-likelihood — sums beat products.g(θ) is g(θ̂) — e.g. MLE of σ is √(MLE of σ²).θ̂ is consistent and √n(θ̂−θ)→N(0, 1/I(θ)) — attains the Cramér–Rao bound asymptotically (efficient).n, not n−1.logL=k·ln p+(n−k)ln(1−p), set derivative 0 → p̂=k/n. The sample proportion is the MLE.argmax p(θ|data) ∝ argmax L(θ)·p(θ) — MLE weighted by a prior; maximizes the log-posterior logL+log prior.x̄=2 from n=4, prior N(0,σ²). MAP pulls the estimate from 2 toward 0; as n grows the data dominates and MAP→MLE.Xᵢ with finite mean μ and variance σ²: √n(X̄−μ)/σ → N(0,1), regardless of the underlying shape.~1/√n); fat tails need much larger n before Normal is safe.SE of mean ≈ 1%/√252 ≈ 0.063% — tiny, but only if returns are roughly iid with finite variance.H₀ (null) vs H₁; pick a statistic; reject if it's extreme under H₀.P(θ∈[a,b]|data)=0.95. Here θ is treated as random — the intuitive statement, but it needs a prior.x̄ ± z·SE (z≈1.96 for 95%). Width ~1/√n.P(≥1)=1−(1−α)^m.α/m. Simple, but conservative → low power.p₍₁₎≤…≤p₍ₘ₎, find largest k with p₍ₖ₎ ≤ (k/m)·q, reject all up to k. Controls FDR at q.P(≥1 false positive)=1−0.95¹⁰⁰≈99.4%. Bonferroni cutoff drops to 0.0005 per test.E[Xₜ]=μ, constant variance, and autocovariance γ(s,t) depends only on lag |t−s|, not on t.Xₜ=t+εₜ has trending mean ⇒ not stationary. Detrend: Xₜ−t=εₜ is stationary. White noise εₜ is stationary; a random walk Sₜ=Σεᵢ is not (variance grows like tσ²).ρ(k)=γ(k)/γ(0): correlation between Xₜ and Xₜ₋ₖ including intermediate lags.k after regressing out lags 1..k−1 — the 'direct' effect.q; PACF tails off (decays).p; ACF tails off.±1.96/√n under the white-noise null.Xₜ=c+Σφᵢ Xₜ₋ᵢ+εₜ — regress on own past. Stationary iff roots of 1−Σφᵢzⁱ=0 lie outside the unit circle.Xₜ=μ+εₜ+Σθⱼ εₜ₋ⱼ — finite memory, always stationary; invertible iff MA roots outside unit circle.Xₜ=φXₜ₋₁+εₜ with |φ|<1 equals Σφʲεₜ₋ⱼ.Xₜ=φXₜ₋₁+εₜ: Var=σ²/(1−φ²), ρ(k)=φ^k. So φ=0.9 ⇒ slow geometric ACF decay; φ=0.1 ⇒ fast decay, near-white.Xₜ=Xₜ₋₁+εₜ (φ=1) — shocks permanent, variance ∝t, best forecast is last value, not stationary, not predictable.c/(1−φ); half-life of a shock = ln(0.5)/ln(φ).φ meaningfully below 1? That's a unit-root test.dXₜ=θ(μ−Xₜ)dt+σdWₜ; θ>0 ⇒ reversion.φ=0.97 daily ⇒ half-life =ln0.5/ln0.97≈22.8 days — a slow reverter; size and horizon must match that half-life or you bleed carry.H₀: φ=1 (unit root, non-stationary) vs H₁: φ<1 (stationary). Run ΔXₜ=γXₜ₋₁+εₜ, test γ=0 where γ=φ−1.ΔX terms to absorb serial correlation; variants include drift/trend.d times to reach stationarity, then fit ARMA(p,q) to ΔᵈX.d=1 handles a stochastic trend (unit root); a deterministic linear trend is better removed by regression, not differencing.d from unit-root tests + ACF (slowly decaying ACF ⇒ difference); over-differencing injects spurious negative MA structure.Δlog price = log return is I(0). So modeling returns ≈ ARIMA with d=1 on log prices.σ²ₜ=ω+α ε²ₜ₋₁. GARCH(1,1): σ²ₜ=ω+α ε²ₜ₋₁+β σ²ₜ₋₁ — today's variance from yesterday's shock and yesterday's variance.=α+β; <1 for stationarity. Unconditional variance =ω/(1−α−β). Equity α+β is often near 1 (long memory in vol).εₜ=σₜ zₜ with zₜ iid. EGARCH/GJR add the leverage (asymmetric) effect.Yₜ−βXₜ is I(0) (stationary) — they share a stochastic trend and a stable long-run spread.Y on X, ADF-test the residual (spread) for stationarity. Johansen: multivariate, tests rank / multiple cointegrating vectors.spread=log P_A−β log P_B; β the hedge ratio from regression. Trade z=(spread−μ)/σ: enter at |z|≈2, exit at z≈0. Risk: β drifts / the relationship breaks (cointegration is not permanent).t trades at t+1 open; restatements and index recompositions are forward-only.t−1 only — and shift the signal one bar before execution.quoted spread = ask − bid; half-spread = (ask−bid)/2. Often quoted relative: (ask−bid)/mid.effective spread = 2·D·(P_trade − mid), D=+1 buy, −1 sell. Measures what you ACTUALLY paid vs mid at trade time — captures price improvement and crossing.realized spread = 2·D·(P_trade − mid_{t+Δ}) uses the mid a few minutes later — what the maker KEEPS after the price moves against them.V) and noise traders (random). MM is competitive & risk-neutral → zero expected profit → quotes are conditional expectations.ask = E[V | buy order], bid = E[V | sell order]. The spread exists EVEN with zero inventory cost and zero fees — purely from information asymmetry.V as trades arrive.V∈{0,100} equally likely, prior mid 50. Fraction α informed. A buy could be informed (V=100) or noise. ask=E[V|buy]>50, bid=E[V|sell]<50; the gap is the adverse-selection spread.u, one market maker sets price from total flow x+u. Linear equilibrium: price = p₀ + λ·(order flow).1/λ is market DEPTH (Kyle calls it the inverse of liquidity).λ = (1/2)·√(Σ₀/σ_u²) where Σ₀ = variance of value, σ_u² = noise-trade variance. More noise traders ⇒ smaller λ ⇒ deeper market.Σ₁ = Σ₀/2).ΔP = λ·(signed volume) + ε.I = (Q_bid − Q_ask)/(Q_bid + Q_ask) ∈ [−1,1].P_micro = (Q_ask·bid + Q_bid·ask)/(Q_bid+Q_ask) — weights the side with LESS size, i.e. tilts toward where the book is thin (the likely move).I=(5000−1000)/6000≈0.67. Microprice = (1000·99.99+5000·100.01)/6000 ≈ 100.007 — pulled toward the ask, predicting an up-move.γ) utility of terminal wealth while holding inventory q over horizon T−t.r = s − q·γ·σ²·(T−t) — the mid SHIFTED away from current inventory. Long inventory (q>0) pulls quotes DOWN to encourage selling.δ_total = γσ²(T−t) + (2/γ)·ln(1 + γ/k), where k governs order-arrival intensity λ=A·e^{−kδ}.r, NOT the mid: bid = r − δ/2, ask = r + δ/2. This skews fills to mean-revert inventory toward zero.t→T the inventory term vanishes; quotes recenter on the mid.Y·σ·√(Q/V) — impact grows with the SQUARE ROOT of participation (Q = order size, V = daily volume, σ = vol).∝ (trade rate), permanent ∝ (size); optimal schedule trades off impact vs timing (volatility) risk.1·0.02·√0.01 = 0.02·0.1 = 0.002 = 20 bps. Buying 4% ⇒ √4=2× the cost, not 4×.Σ pᵢvᵢ / Σ vᵢ; beating VWAP means trading better than the average participant.IS = (paid − decision-price) on executed + opportunity cost on unexecuted. The honest all-in benchmark: it charges you for delay, impact, fees, AND missed fills.| Op | Example | Result | Meaning |
|---|---|---|---|
| & AND | 5 & 3 = 0101 & 0011 | 0001 = 1 | 1 only where both bits are 1 |
| | OR | 5 | 3 = 0101 | 0011 | 0111 = 7 | 1 where either bit is 1 |
| ^ XOR | 5 ^ 3 = 0101 ^ 0011 | 0110 = 6 | 1 where the bits differ |
| ~ NOT | ~5 | −6 | flip every bit (~x = −x−1) |
| << shift | 1 << 4 | 16 | x << n = x·2ⁿ |
| >> shift | 20 >> 2 | 5 | x >> n = ⌊x / 2ⁿ⌋ |
x & 1 tests odd/even; x & (x−1) clears the lowest set bit, so x & (x−1) == 0 tests a power of two.x & (−x) isolates the lowest set bit; popcount via Brian Kernighan loops x &= x−1 and counts iterations.x | (1<<k) · x & ~(1<<k) · x ^ (1<<k) · (x >> k) & 1.x^x=0, x^0=x: find the lone unpaired element by XOR-ing the whole array; swap two ints with no temp via three XORs.0 … (1<<n)−1 — the backbone of bitmask DP (e.g. held-Karp TSP over subsets).[4,1,2,1,2], every value paired but one. 4^1^2^1^2 = 4 — the pairs cancel, leaving the unique element, in O(n) time and O(1) space.O(n)+O(n²)=O(n²). Product rule: nested loops multiply, O(n)·O(m)=O(nm).T(n)=2T(n/2)+O(n)→O(n log n) (mergesort); T(n)=2T(n/2)+O(1)→O(n) (heapify-style).O(log n) binary search) or from balanced-tree depth; base of the log is a constant, so omit it.in on a list are each O(n) inside a loop → quietly O(n²).s += c over n chars is O(n²) (each concat copies); join a list once for O(n).target−x in a hash set is O(n); the brute nested loop is O(n²).| avg | worst | space | stable | |
|---|---|---|---|---|
| quicksort | n log n | n² | O(log n) | no |
| mergesort | n log n | n log n | O(n) | yes |
| heapsort | n log n | n log n | O(1) | no |
(i−1)/2, children 2i+1 and 2i+2; no pointers needed.arr[abs(x)−1]) or cycle detection on the value-as-pointer graph.k/i, evicting a uniformly random current holder.k/n after n items — the early-vs-late asymmetry exactly cancels.1/i; trivial to state and a common warm-up.1·½·⅔···(n−1)/n = 1/n — uniform, as required.dp[a]=min(dp[a−c]+1) over coins c; O(amount·#coins), 1-D array.16% of 25. Swap: 25% of 16 = 4.10% (move decimal), 5% (half of 10%), 1% (move twice).1bp = 0.01% = 0.0001; 100bp = 1%.15% = 10% + 5%; 17.5% = 10%+5%+2.5%.8% of 50 = 50% of 8 = 4. And 35bp of $20M = 0.0035×20M = $70k.1/2=.5, 1/4=.25, 1/8=.125, 1/16=.0625.3/16=.1875, 5/16=.3125, 7/16=.4375 (odd k: k/16).142857: 1/7=.142857, 2/7=.285714, … (rotate the block).1/3=.3̄, 1/6=.16̄, 1/9=.1̄, 1/11=.0909, 1/12=.0833.3/7 = .428571 — same six digits as 1/7, started at the right spot (3×.142857).35² = 3·4|25 = 1225.n² = (n+d)(n−d) + d² where d = n−b.53² = (50+3)² → 2500 + 100·3 + 9 = 2809.97² = (100−3)² = 10000 − 600 + 9 = 9409.(a±b)² = a² ± 2ab + b² — pick a as a round anchor.48² = (50−2)² = 2500 − 200 + 4 = 2304.17×19 = 18² − 1² = 324 − 1 = 323; 38×42 = 40² − 2² = 1600 − 4 = 1596.11×34 = 3|3+4|4 = 374.23×14 = 23×10 + 23×4 = 230 + 92 = 322.54×46 = 50² − 4² = 2500 − 16 = 2484 (pair averages 50, gap 4).√2≈1.414, √3≈1.732, √5≈2.236, √7≈2.646.√10≈3.162 (= √2·√5); √12 = 2√3 ≈ 3.464.√S ≈ (g + S/g)/2 for guess g.√(100x) = 10√x; move the decimal in pairs, not singly.√50: guess 7, (7 + 50/7)/2 = (7 + 7.14)/2 ≈ 7.07 (true 7.071).72/8 = 9.0 vs true 9.01 yr.72/7 = 10.29 vs true 10.24 yr.ln2 = .693); 72 has more clean divisors.ln3 ≈ 1.10, time ≈ 110/rate (e.g. 6% → ~18 yr).72/6 = 12 yr; true value 11.9 yr.+x then −x = (1+x)(1−x) = 1 − x² — always a net loss of x².0.5%/mo → 1.005¹² = 1.0617 (6.17%, not 6%).= ∏(1+rᵢ) − 1, never add unless tiny.ln(1+r) ≈ r.+10% then −10% = 1 − .01 = 0.99 → down 1%, not flat.ln2 ≈ .693, ln10 ≈ 2.303; log₁₀2 = .301, log₁₀3 = .477, log₁₀7 = .845.log₁₀5 = 1 − .301 = .699; log₁₀6 = .301+.477 = .778.e ≈ 2.718; e² ≈ 7.39; doubling-decibel sense: +3dB ≈ ×2 since 10^.3 ≈ 2.ln x = 2.303 · log₁₀ x.ln1.1 ≈ .1 − .005 = .095 (true .0953); so 10%/yr doubles in .693/.095 ≈ 7.3 yr.×√12 ≈ 3.46; weekly: ×√52 ≈ 7.2.×√252, monthly ×√12.√time, return scales with time — that gap is why Sharpe grows.notional × move; in bps, $X × bp/10000.20M × .0035 = $70k. A 0.8% daily-vol book ≈ 0.8×16 = 12.8% annual vol.f is continuous at a if lim(x→a) f(x) = f(a) — no jumps, holes, or blow-ups.
lim(x→a⁻) = lim(x→a⁺).g ≤ f ≤ h and both ends → L, then f → L. Gives lim(x→0) sin(x)/x = 1.lim(x→2) (x²−4)/(x−2) is 0/0 raw. Factor: (x−2)(x+2)/(x−2) = x+2 → 4.| Rule | Formula |
|---|---|
| Power | d/dx xⁿ = n·xⁿ⁻¹ |
| Product | (fg)′ = f′g + fg′ |
| Quotient | (f/g)′ = (f′g − fg′)/g² |
| Chain | (f(g(x)))′ = f′(g(x))·g′(x) |
d/dx [x²·sin x] = 2x·sin x + x²·cos x (product). And d/dx (3x+1)⁵ = 5(3x+1)⁴·3 = 15(3x+1)⁴ (chain).| f(x) | f′(x) |
|---|---|
eˣ | eˣ |
aˣ | aˣ·ln a |
ln x | 1/x |
sin x | cos x |
cos x | −sin x |
tan x | sec²x = 1/cos²x |
eˣ is its own derivative — the only function (up to scale) with that property.
d/dx ln(x²+1) = 1/(x²+1)·2x = 2x/(x²+1) (chain with ln).u = g(x), du = g′(x)dx.∫u dv = uv − ∫v du. Pick u by LIATE (Log, Inverse-trig, Algebraic, Trig, Exp).∫xⁿdx = xⁿ⁺¹/(n+1), ∫1/x dx = ln|x|, ∫eˣdx = eˣ, ∫sin x dx = −cos x, ∫cos x dx = sin x.
∫x·eˣ dx: let u=x, dv=eˣdx → uv−∫v du = x·eˣ − ∫eˣdx = eˣ(x−1)+C.∫₋∞^∞ e^(−x²) dx = √π
∫₋∞^∞ e^(−a·x²) dx = √(π/a).∫₋∞^∞ (1/√(2πσ²))·e^(−x²/2σ²) dx = 1.(∫e^(−x²)dx)² = ∫∫e^(−r²)·r dr dθ = π, so the integral is √π ≈ 1.7725.
∫₋∞^∞ e^(−2x²) dx = √(π/2) ≈ 1.2533.x=0 any smooth function ≈ a polynomial: f(x) ≈ f(0) + f′(0)x + f″(0)x²/2 + …. Memorize these:
| Function | Series |
|---|---|
eˣ | 1 + x + x²/2 + x³/6 + … |
ln(1+x) | x − x²/2 + x³/3 − … |
(1+x)ⁿ | 1 + nx + n(n−1)x²/2 + … |
sin x | x − x³/6 + x⁵/120 − … |
cos x | 1 − x²/2 + x⁴/24 − … |
eˣ ≈ 1+x, ln(1+x) ≈ x, (1+x)ⁿ ≈ 1+nx, sin x ≈ x, cos x ≈ 1−x²/2.
e^0.1: ≈ 1+0.1+0.005 = 1.105 (actual 1.10517). And ln(1.1) ≈ 0.1−0.005 = 0.095 (actual 0.09531).f(x,y,…), the partial ∂f/∂x differentiates in x treating other vars as constants. The gradient stacks them:
∇f = (∂f/∂x, ∂f/∂y, …).
u: ∇f · u.f = x²y + y³. ∂f/∂x = 2xy, ∂f/∂y = x² + 3y². At (1,2): ∇f = (4, 13).f′(x)=0 (or ∇f=0) → critical points.f″>0 → min, f″<0 → max, f″=0 → inconclusive.f = x³ − 3x. f′ = 3x²−3 = 0 → x = ±1. f″ = 6x: at x=1, f″=6>0 (min); at x=−1, f″=−6<0 (max).f subject to constraint g = c. At the optimum the gradients align:
∇f = λ·∇g, solved together with g = c.
Intuition: you can't improve f without violating g once the level set of f is tangent to the constraint.
f = xy on x + y = 10. ∇f=(y,x), ∇g=(1,1) → y=λ, x=λ so x=y. With x+y=10: x=y=5, max product 25. (Confirms AM–GM.)0/0 or ∞/∞, differentiate top and bottom separately:
lim f/g = lim f′/g′ (repeat if still indeterminate).
0·∞, ∞−∞, 1^∞) must first be rewritten as a quotient.1^∞ etc., take logs first.lim(x→0) (1−cos x)/x² = 0/0 → sin x / 2x = 0/0 → cos x / 2 → 1/2. (Matches cos x ≈ 1−x²/2.)f is convex on an interval iff f″(x) ≥ 0 there; concave iff f″ ≤ 0. Convex = holds water, lies below its chords, above its tangents.f, E[f(X)] ≥ f(E[X]); flip the sign for concave. Equality iff X is constant or f linear.E[f(X)] − f(E[X]) grows with Var(X) and curvature — this is exactly option vega/gamma in disguise.f(x) ≥ f(a) + f′(a)(x−a) for all a — the basis of the supporting-hyperplane argument.x², eˣ, −ln x, 1/x (x>0) are convex; ln x, √x are concave.f(x)=x² convex, so E[X²] ≥ (E[X])² — which restates Var(X) = E[X²]−(E[X])² ≥ 0. Jensen is the structural reason variance is nonnegative.t into an integral I(t)=∫ f(x,t) dx, then swap derivative and integral: I′(t)=∫ ∂f/∂t dx (Leibniz rule), integrate the easier result, and fix the constant from a known value.∫₋∞^∞ e^(−tx²) dx = √(π/t) in t to pull down powers of x².E[X²ⁿ] for standard normal: (2n−1)!! = 1·3·5···(2n−1) (e.g. E[X²]=1, E[X⁴]=3, E[X⁶]=15).∂f/∂t is dominated by an integrable bound (Leibniz integral rule / dominated convergence) — fine for the smooth, fast-decaying integrands quants see.∫₋∞^∞ e^(−tx²) dx = √π·t^(−1/2), differentiate both sides in t: ∫ −x²e^(−tx²) dx = −½√π·t^(−3/2). At t=1: ∫₋∞^∞ x²e^(−x²) dx = √π/2 ≈ 0.8862.x: f(x+h) ≈ f(x) + ∇f(x)ᵀh + ½ hᵀH(x)h, where H is the Hessian of second partials Hᵢⱼ=∂²f/∂xᵢ∂xⱼ.H is symmetric when second partials are continuous (Clairaut/Schwarz): ∂²f/∂x∂y = ∂²f/∂y∂x.h = −H⁻¹∇f — the multivariate analogue of x − f′/f″.v is vᵀHv/(vᵀv); bounded by the smallest/largest eigenvalues of H.≈ ΔᵀΔx + ½Δxᵀ Γ Δx (delta–gamma), with Γ the Hessian of value.f=x²+xy+y² at origin: ∇f=(0,0), H=[[2,1],[1,2]]. So f(h)≈½hᵀHh = h₁²+h₁h₂+h₂² — exact here since f is already quadratic.‖v‖ = √(v·v).
u·v = Σ uᵢvᵢ = ‖u‖‖v‖cos θ.u·v = 0 ⟺ vectors are orthogonal (⊥).|u·v| ≤ ‖u‖‖v‖.u=(1,2), v=(3,4): u·v = 3+8 = 11, ‖u‖=√5, ‖v‖=5, so cos θ = 11/(5√5) ≈ 0.984 → θ ≈ 10.3°.(AB)ᵢⱼ = Σ Aᵢₖ Bₖⱼ — row of A dotted with column of B. Inner dimensions must match: (m×n)(n×p) = (m×p).
AB ≠ BA in general.(AB)C = A(BC), A(B+C)=AB+AC.(AB)ᵀ = BᵀAᵀ and (AB)⁻¹ = B⁻¹A⁻¹ (order flips).[[1,2],[3,4]]·[[0,1],[1,0]] = [[2,1],[4,3]] (swaps columns), but the reverse product is [[3,4],[1,2]] (swaps rows) — different.(Aᵀ)ᵢⱼ = Aⱼᵢ. Inverse A⁻¹ satisfies A·A⁻¹ = I and exists only when det A ≠ 0.
For 2×2 [[a,b],[c,d]]: A⁻¹ = 1/(ad−bc)·[[d,−b],[−c,a]].
QᵀQ = I, so Q⁻¹ = Qᵀ (cheap, stable).A=[[1,2],[3,4]], det = 1·4−2·3 = −2. A⁻¹ = (−1/2)·[[4,−2],[−3,1]] = [[−2,1],[1.5,−0.5]].det A is the signed volume-scaling factor of the linear map; sign flags orientation flip.
det A = 0 ⟺ singular (columns dependent, map collapses a dimension, no inverse).det(AB) = det A · det B; det(Aᵀ)=det A; det A = ∏ eigenvalues.ad−bc. 3×3: cofactor expansion.[[2,0],[0,3]] scales area by det = 6: the unit square maps to a 2×3 rectangle.n (square) ⟺ invertible ⟺ det ≠ 0.rank(A) + nullity(A) = #columns.[[1,2],[2,4]]: row 2 = 2×row 1, so rank 1 (not 2). Determinant is 0 — singular.Av = λv. Find λ from the characteristic polynomial det(A − λI) = 0.
Σλ = trace(A); ∏λ = det(A) — fast sanity checks.(A − λI)v = 0 for each eigenvector.A=[[2,1],[1,2]]: det([[2−λ,1],[1,2−λ]]) = (2−λ)²−1 = 0 → λ = 3, 1. Check: trace 4 = 3+1, det 3 = 3·1. Eigenvectors: (1,1) for 3, (1,−1) for 1.A = Aᵀ → real eigenvalues and orthogonal eigenvectors (spectral theorem). A quadratic form is xᵀAx.
| Class | Condition |
|---|---|
| Positive definite (PD) | xᵀAx > 0 ∀x≠0; all λ > 0 |
| Positive semidefinite (PSD) | xᵀAx ≥ 0; all λ ≥ 0 |
XᵀX are always PSD.
A=[[2,0],[0,3]]: xᵀAx = 2x₁²+3x₂² > 0 → PD (eigenvalues 2, 3).A = UΣVᵀ: U, V orthogonal, Σ diagonal of singular values σ₁ ≥ σ₂ ≥ … ≥ 0.
√ eigenvalues of AᵀA; rank = #nonzero σ. Basis for PCA and compression.A_k = Σᵢ₌₁ᵏ σᵢuᵢvᵢᵀ — optimal in both spectral and Frobenius norm.‖A−A_k‖₂ = σ_{k+1}, ‖A−A_k‖_F = √(Σ_{i>k} σᵢ²). Energy captured = Σᵢ₌₁ᵏσᵢ² / Σσᵢ² (scree / variance-explained).mn to k(m+n+1) — the win when σ decays fast.A⁻¹ (slower, less stable, destroys sparsity).
| Method | Use when |
|---|---|
| LU | general square A |
Cholesky (A=LLᵀ) | A symmetric PD (≈2× faster) |
| QR | least squares / tall A, stable |
A⁻¹ then A⁻¹b wastes work and amplifies rounding error.
Xβ ≈ y when no exact solution exists, minimize ‖Xβ − y‖². Setting the gradient to zero gives the normal equations:
XᵀX β = Xᵀy → β = (XᵀX)⁻¹Xᵀy (in practice solved by QR).
Geometrically, Xβ̂ is the orthogonal projection of y onto the column space of X; the residual is ⊥ that space.
X=[[1,1],[1,2],[1,3]]. Solving normal equations gives intercept 2/3, slope 1/2 → ŷ = 0.667 + 0.5x.tr(ABC) = tr(BCA) = tr(CAB).
Gram–Schmidt turns independent vectors into an orthonormal basis by subtracting projections:
uₖ = vₖ − Σ proj(vₖ onto earlier uᵢ), then normalize. This is the Q in QR.
v₁=(1,0), v₂=(1,1): keep u₁=(1,0). Subtract projection: u₂ = (1,1) − (v₂·u₁)u₁ = (1,1)−(1,0) = (0,1). Now orthonormal.A=VΛV⁻¹ needs A square; V orthogonal only if A is symmetric (spectral theorem).A=UΣVᵀ exists for ANY m×n matrix; U,V always orthonormal, σᵢ≥0.σᵢ=λᵢ, left/right singular vectors equal eigenvectors.σᵢ=√(λᵢ(AᵀA)); eigenvalues of A can be complex/negative, σ never; σᵢ=|λᵢ| for any normal A (e.g. rotations), not only symmetric.κ(A)=σ_max/σ_min (2-norm); for symmetric, =|λ|_max/|λ|_min.x solving Ax=b can be up to κ(A) × relative error in b or A.log₁₀κ digits of accuracy; κ≈10¹⁶ in double precision means the answer is garbage.κ(AᵀA)=κ(A)² — why QR/SVD beats forming AᵀA for least squares.λI: solving (AᵀA+λI)x shifts every eigenvalue up by λ, capping κ at (σ_max²+λ)/(σ_min²+λ) — strictly better-conditioned.d = √(Σ(aᵢ−bᵢ)²).
cos θ = (u·v)/(‖u‖‖v‖).u×v is ⊥ to both, with ‖u×v‖ = ‖u‖‖v‖sin θ = area of the parallelogram they span.u=(1,0,0), v=(0,1,0): u·v=0 (⊥), u×v=(0,0,1), area = 1.y = mx + b, or ax + by = c.ax + by + cz = d, with normal vector n = (a,b,c).|ax₀+by₀−c| / √(a²+b²). Plane version uses √(a²+b²+c²).−1.3x + 4y = 10: |−10|/√(9+16) = 10/5 = 2.(x−h)² + (y−k)² = r²; circumference 2πr, area πr².(x−h)²+(y−k)²+(z−l)² = r²; surface 4πr², volume (4/3)πr³.x²+y²−6x+8 = 0 → complete the square: (x−3)²+y² = 1, a circle centered (3,0) radius 1.| Shape | Formula |
|---|---|
| Triangle | ½·base·height |
| Triangle (Heron) | √(s(s−a)(s−b)(s−c)), s=(a+b+c)/2 |
| Circle | area πr², circ. 2πr |
| Sphere | vol (4/3)πr³, surf. 4πr² |
| Cone | (1/3)πr²h |
| Regular tetrahedron (edge a) | a³/(6√2) |
½|x₁(y₂−y₃)+x₂(y₃−y₁)+x₃(y₁−y₂)|.
½·4·3 = 6. Regular tetrahedron edge 1: vol 1/(6√2) ≈ 0.1179.a/a′ = b/b′ = c/c′.h/20 = 6/4 → tree height = 30 ft.P = 2⁻ⁿ⁺¹ · Σ_{k=0}^{d−1} C(n−1, k).
P = n/2ⁿ⁻¹.P = (n² − n + 2)/2ⁿ.P = 4/2³ = 1/2. 4 points on a sphere: P = (16−4+2)/16 = 14/16 = 7/8.((x₁+x₂)/2, (y₁+y₂)/2).(y₂−y₁)/(x₂−x₁).((mx₂+nx₁)/(m+n), (my₂+ny₁)/(m+n)).((0+6+0)/3, (0+0+9)/3) = (2, 3).a² + b² = c². Memorize triples (3,4,5), (5,12,13), (8,15,17) and scalings.
= √(l²+w²+h²).= (√3/2)a, area = (√3/4)a².√(1+4+4) = √9 = 3. Area of equilateral triangle side 2: (√3/4)·4 = √3 ≈ 1.732.Area = ½|x_A(y_B−y_C)+x_B(y_C−y_A)+x_C(y_A−y_B)|. Equivalent to ½|(B−A)×(C−A)| in 2D.s=(a+b+c)/2, Area=√(s(s−a)(s−b)(s−c)).Area=½·a·b·sinθ.Area=½‖(B−A)×(C−A)‖ — the cross-product magnitude is twice the area.+ counterclockwise, − clockwise. Collinear ⟺ area 0.½|0(0−3)+4(3−0)+1(0−0)| = ½·12 = 6.P = measure(event)/measure(space).1/4 (each of x, y−x, 1−y must be < ½); expected distance between two uniform points on [0,1] = 1/3; Bertrand chord — pin down the sampling rule first, "random" is ambiguous.(a·b/‖b‖²)·b; its length is |a·b|/‖b‖.dist = ‖(P−A) − proj_d(P−A)‖. In 2D/3D: ‖(P−A)×d‖/‖d‖.n·x=c: dist = |n·P − c|/‖n‖ (signed if you drop the abs — sign tells which side).|(A₂−A₁)·(d₁×d₂)|/‖d₁×d₂‖.2x−y+2z=3: |2−2+6−3|/√(4+1+4) = 3/3 = 1.| + | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| × | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| 1 | 1 | 2 | 3 | 4 | 5 | 6 |
| 2 | 2 | 4 | 6 | 8 | 10 | 12 |
| 3 | 3 | 6 | 9 | 12 | 15 | 18 |
| 4 | 4 | 8 | 12 | 16 | 20 | 24 |
| 5 | 5 | 10 | 15 | 20 | 25 | 30 |
| 6 | 6 | 12 | 18 | 24 | 30 | 36 |
| Dist | pmf | Mean | Variance | Use |
|---|---|---|---|---|
| DiscreteUniform{1..N} | 1/N, k=1..N | (N+1)/2 | (N²−1)/12 | fair die (N=6), random index |
| Bernoulli(p) | P(1)=p, P(0)=1−p | p | p(1−p) | single yes/no trial |
| Binomial(n,p) | C(n,k) pᵏ(1−p)ⁿ⁻ᵏ | np | np(1−p) | # successes in n trials |
| Geometric(p) — # trials | (1−p)ᵏ⁻¹ p, k≥1 | 1/p | (1−p)/p² | trial of 1st success |
| Geometric(p) — # failures | (1−p)ᵏ p, k≥0 | (1−p)/p | (1−p)/p² | failures before 1st success |
| NegBinom(r,p) — # failures | C(k+r−1,k) pʳ(1−p)ᵏ | r(1−p)/p | r(1−p)/p² | failures before r-th success |
| Poisson(λ) | e⁻λ λᵏ/k! | λ | λ | rare events, fixed interval |
| Dist | Mean | Variance | |
|---|---|---|---|
| Uniform(a,b) | 1/(b−a), a≤x≤b | (a+b)/2 | (b−a)²/12 |
| Exponential(λ) | λe⁻λˣ, x≥0 | 1/λ | 1/λ² |
| Normal(μ,σ²) | (1/√(2πσ²)) e^(−(x−μ)²/2σ²) | μ | σ² |
| Lognormal(μ,σ²) | (1/(xσ√(2π))) e^(−(ln x−μ)²/2σ²) | e^(μ+σ²/2) | (e^(σ²)−1)e^(2μ+σ²) |
| Gamma(k,θ) shape/scale | xᵏ⁻¹e⁻ˣ/θ/(Γ(k)θᵏ) | kθ | kθ² |
| Beta(α,β) | xᵅ⁻¹(1−x)ᵝ⁻¹/B(α,β) | α/(α+β) | αβ/[(α+β)²(α+β+1)] |
| Chi-square(k) | Gamma(k/2, 2) | k | 2k |
| Student-t(ν) | ∝(1+t²/ν)^(−(ν+1)/2) | 0 (ν>1) | ν/(ν−2) (ν>2) |
Φ(z) = P(Z ≤ z), symmetric: Φ(−z) = 1−Φ(z).
P(Z > z) ≤ (1/(z√(2π))) e^(−z²/2) for z > 0.M_X(t) = E[e^(tX)]. Moments: E[Xⁿ] = M^(n)(0). Independent sums multiply MGFs.
| Dist | MGF |
|---|---|
| Bernoulli(p) | 1−p+peᵗ |
| Binomial(n,p) | (1−p+peᵗ)ⁿ |
| Poisson(λ) | e^(λ(eᵗ−1)) |
| Exponential(λ) | λ/(λ−t), t < λ |
| Normal(μ,σ²) | e^(μt+½σ²t²) |
| Gamma(k,θ) | (1−θt)⁻ᵏ, t < 1/θ |
P(X > s+t | X > s) = P(X > t).
P(X > t) = e^(−λt); hazard rate is constant = λ.min(Exp(λ₁),…,Exp(λₙ)) ~ Exp(λ₁+…+λₙ).λᵢ / Σλⱼ.f₍ₖ₎(x) = [n! / ((k−1)!(n−k)!)] · F(x)ᵏ⁻¹ · (1−F(x))ⁿ⁻ᵏ · f(x)
— pick which sample is the k-th (the count factor), k−1 fall below it, n−k above, one sits at x.
k/(n+1); max mean n/(n+1), min mean 1/(n+1).E[X₍ₖ₎] = k(N+1)/(m+1) and Var(X₍ₖ₎) = k(m+1−k)(N+1)(N−m) / [(m+1)²(m+2)]
(the discrete analog of the Beta result; note m is the sample size, N the population top).
E = m(N+1)/(m+1) — the German tank problem; invert for N̂ = max·(m+1)/m − 1.E[ΣXᵢ]=ΣE[Xᵢ], Var(ΣXᵢ)=ΣVar(Xᵢ) (independence needed only for variance).| Sum of independent | Result |
|---|---|
| Normal(μᵢ,σᵢ²) | Normal(Σμᵢ, Σσᵢ²) |
| Poisson(λᵢ) | Poisson(Σλᵢ) |
| n × Bernoulli(p) | Binomial(n,p) |
| Binomial(nᵢ,p) (same p) | Binomial(Σnᵢ, p) |
| n × Exp(λ) | Gamma(n, 1/λ) = Erlang |
| Gamma(kᵢ,θ) (same θ) | Gamma(Σkᵢ, θ) |
| χ²(kᵢ) | χ²(Σkᵢ) |
N(μ₁,σ₁²), N(μ₂,σ₂²): write as X+(−Y); means subtract but VARIANCES STILL ADD → N(μ₁−μ₂, σ₁²+σ₂²). Classic sign-trap.U~Uniform(0,1) then X=F⁻¹(U) has CDF F. Conversely F(X)~Uniform(0,1).U=1−e^(−λx) ⇒ X=−ln(U)/λ (use ln U; U and 1−U are both uniform).Z₁=√(−2 ln U₁)·cos(2πU₂), Z₂=√(−2 ln U₁)·sin(2πU₂) are iid N(0,1).X=⌈ln(U)/ln(1−p)⌉ — the discrete inverse-CDF.| Hand | Count | Combos — C(n,k) form | How to count it | Prob |
|---|---|---|---|---|
| Royal flush | 4 | C(4,1) | the one A-K-Q-J-10 run, one per suit | 0.000154% |
| Straight flush (excl. royal) | 36 | 10·C(4,1) − 4 | 10 runs × 4 suits = 40, minus 4 royals | 0.00139% |
| Four of a kind | 624 | C(13,1) C(4,4) C(48,1) | rank × all 4 suits × any 5th card | 0.0240% |
| Full house | 3,744 | C(13,1) C(4,3) C(12,1) C(4,2) | trip rank+suits × different pair rank+suits | 0.1441% |
| Flush (excl. str. flush) | 5,108 | C(4,1) C(13,5) − 40 | suit × any 5 ranks, minus straight flushes | 0.1965% |
| Straight (excl. str. flush) | 10,200 | 10·C(4,1)⁵ − 40 | 10 runs × any suit per card, minus straight flushes | 0.3925% |
| Three of a kind | 54,912 | C(13,1) C(4,3) C(12,2) C(4,1)² | trip × two distinct off-ranks, one suit each | 2.1128% |
| Two pair | 123,552 | C(13,2) C(4,2)² C(11,1) C(4,1) | two pair ranks+suits × kicker rank+suit | 4.7539% |
| One pair | 1,098,240 | C(13,1) C(4,2) C(12,3) C(4,1)³ | pair rank+suits × three distinct kickers | 42.257% |
| High card | 1,302,540 | [C(13,5)−10][C(4,1)⁵−4] | any 5 ranks (not a run) × any suits (not all one) | 50.118% |
P(n,k) = n!/(n−k)!.C(n,k) = n!/(k!(n−k)!), and C(n,k)=C(n,n−k).n!/(n₁! n₂! … nₘ!).(n−1)!.(x+y)ⁿ = Σ_{k=0..n} C(n,k) xᵏ yⁿ⁻ᵏ.
C(n,k)=C(n−1,k−1)+C(n−1,k).Σ_k C(n,k) = 2ⁿ (subsets of an n-set).Σ_k (−1)ᵏ C(n,k) = 0 for n ≥ 1.Σ_j C(m,j)C(n,k−j) = C(m+n,k).Σ_{i=r..n} C(i,r) = C(n+1,r+1).x₁+…+x_k = n: C(n+k−1, k−1).C(n−1, k−1).|A∪B∪C| = Σ|Aᵢ| − Σ|Aᵢ∩Aⱼ| + |A₁∩A₂∩A₃|; alternate signs in general.
Dₙ = n! Σ_{k=0..n} (−1)ᵏ/k!, recursion Dₙ = (n−1)(Dₙ₋₁ + Dₙ₋₂).
Dₙ/n! → 1/e ≈ 0.3679 fast — the "hat-check" limit.Cₙ = C(2n,n)/(n+1) = C(2n,n) − C(2n,n+1); sequence 1,1,2,5,14,42,132,…; recurrence Cₙ₊₁ = Σ Cᵢ Cₙ₋ᵢ.
E[ΣXᵢ] = ΣE[Xᵢ] always — even when Xᵢ are dependent. Write counts as sums of 0/1 indicators.
⌈n/m⌉ items. Powerful for existence arguments.
G(x) = Σ aₙ xⁿ; products = convolutions (great for sums of independent counts).
x+x²+…+x⁶. Two dice: square it; coefficient of xˢ = ways to roll sum s.1/(1−x) = Σ xⁿ; 1/(1−x)ᵏ = Σ C(n+k−1, k−1) xⁿ (recovers stars-and-bars).E[zˣ]; for Poisson it is e^(λ(z−1)).P(n,k)=n!/(n−k)!. Order matters, repeat: nᵏ.C(n,k)=n!/(k!(n−k)!). Order doesn't matter, repeat (multiset): C(n+k−1,k) — this is stars & bars.n!/(k₁!k₂!…kₘ!).11!/(4!·4!·2!·1!)=34650.(n−1)!.(n−1)!/2 for n≥3.(1/|G|)·Σ|Fix(g)|.(5−1)!=24. As a flippable bracelet of 5 distinct beads: 4!/2=12.P(Xₙ₊₁=j | Xₙ=i, past) = Pᵢⱼ. Rows of P sum to 1. n-step probabilities = Pⁿ.
πP = π, Σπᵢ = 1.Pⁿ → 1π regardless of start.πᵢPᵢⱼ = πⱼPⱼᵢ (detailed balance) ⇒ such π is stationary.E[Sₙ]=0, Var(Sₙ)=n.
i/N; expected duration = i(N−i).P(reach N)=(1−rⁱ)/(1−rᴺ).E[Xₙ₊₁ | past] = Xₙ (fair game). Optional Stopping Theorem: E[X_τ] = E[X₀] for stopping time τ when (e.g.) τ is bounded, or bounded increments + finite E[τ].
Sₙ² − n are martingales.Wₜ − Wₛ ~ N(0, t−s).Cov(Wₛ, Wₜ) = min(s,t).W_{ct} =ᵈ √c · Wₜ. Scales as √t (the "square-root-of-time" rule).N(t) ~ Poisson(λt); independent increments over disjoint intervals.Exponential(λ); n-th arrival time ~ Gamma(n, 1/λ).(dW)² = dt, dt·dW = 0, (dt)² = 0. For dX = a dt + b dW and f(t,X):
df = (∂f/∂t + a ∂f/∂x + ½b² ∂²f/∂x²) dt + b ∂f/∂x dW.
The ½b²f_xx term is the Itô correction — absent in ordinary calculus.
dS = μS dt + σS dW. Solution (via Itô on ln S):
Sₜ = S₀ exp((μ − ½σ²)t + σWₜ).
E[Sₜ] = S₀ e^(μt); Sₜ is lognormal.Var(Sₜ) = S₀² e^(2μt)(e^(σ²t) − 1).dX = θ(μ − X) dt + σ dW, θ > 0 = reversion speed.
E[Xₜ] = μ + (X₀ − μ)e^(−θt) → μ.σ²/(2θ); stationary law N(μ, σ²/(2θ)).ln 2 / θ.∫₀ᵗ f dW is defined for adapted, square-integrable f; it is a martingale with E[∫f dW]=0.E[(∫₀ᵗ f dW)²]=E[∫₀ᵗ f² ds] — converts a stochastic integral's variance into an ordinary time integral.dW·dW=dt, dt·dW=0, dt·dt=0. This is why second-order terms survive in Itô's lemma.dX=μ(X,t)dt+σ(X,t)dW with zero drift μ≡0 is a local martingale — a true martingale when σ is square-integrable (E[∫σ²ds]<∞).d(XY)=X dY+Y dX+dX·dY.Xₜ=∫₀ᵗ W_s dW_s, apply Itô to Wₜ²: d(Wₜ²)=2Wₜ dWₜ+dt, so ∫₀ᵗ W dW=½(Wₜ²−t). The −t/2 is the Itô correction a Riemann integral would miss.ℚ, discounted assets are martingales: S₀=E^ℚ[e^{−rT}S_T]; every tradable's drift becomes r.W̃ₜ=Wₜ+∫₀ᵗ θ_s ds is a ℚ-Brownian motion.dℚ/dℙ=exp(−∫θ dW−½∫θ² ds).θ=(μ−r)/σ is the drift adjustment that turns the real-world drift μ into r.∂_t V+μ ∂_x V+½σ² ∂_xx V−rV=0 with terminal data V(T,x)=payoff has solution V=E^ℚ[e^{−r(T−t)}payoff].dS=rS dt+σS dW̃. Feynman–Kac says the BS PDE's solution equals e^{−rT}E^ℚ[(S_T−K)⁺] — same answer two ways.P(max_{s≤t} Wₛ ≥ a) = 2·P(Wₜ ≥ a) = 2Φ(−a/√t) for a>0.τ_a = inf{t : Wₜ = a} has the Lévy (stable-½) density (a/√(2πt³)) e^(−a²/2t); heavy-tailed, so E[τ_a] = ∞ for driftless BM.τ_a is a.s. finite (1D BM is recurrent) but has infinite mean — it hits, but you cannot bound when.Xₜ = μt + σWₜ: hitting a>0 is certain only if μ ≥ 0; if μ < 0, P(ever hit a) = exp(2μa/σ²) < 1.−b, a), no drift: P(hit a first) = b/(a+b).ŷ = β₀ + β₁x through a cloud of points. The slope and intercept that minimize squared error have closed forms:
β₁ = Cov(x,y) / Var(x) (slope = how much y moves per unit x)β₀ = ȳ − β₁x̄ (line passes through the mean point (x̄, ȳ))β₁ = r · (sy/sx), so the slope is the correlation rescaled by the ratio of standard deviations.
β₁ = 2.0/2.5 = 0.8 and β₀ = 4.2 − 0.8·3 = 1.8 → ŷ = 1.8 + 0.8x. At x=3, ŷ = 4.2.SSR = Σ(yᵢ − ŷᵢ)². Setting the gradient to zero gives the normal equations:
XᵀX β = Xᵀy → β = (XᵀX)⁻¹Xᵀy (when XᵀX is invertible)Xᵀε = 0H = X(XᵀX)⁻¹Xᵀ does the projection (ŷ = Hy); the residual e = (I−H)y is the part of y orthogonal to that space. Minimizing distance = dropping a perpendicular.
(XᵀX)⁻¹Xᵀy = (1/n)Σyᵢ = ȳ. OLS reduces to the sample mean — the projection of y onto the constant vector is its average. Residuals then sum to zero, the defining orthogonality condition.| Assumption | What it buys |
|---|---|
| Linearity in parameters | Model is correctly specified |
E[ε|X] = 0 (exogeneity) | Unbiasedness: E[β̂] = β |
Homoskedasticity Var(εᵢ) = σ² | Efficiency / correct standard errors |
No autocorrelation Cov(εᵢ,εⱼ)=0 | Efficiency / correct SEs |
| No perfect multicollinearity | XᵀX invertible → β identified |
R² = 1 − SSR/SST = ESS/SST, the fraction of variance in y explained by the model. Here SST = Σ(yᵢ−ȳ)², SSR = Σ(yᵢ−ŷᵢ)².
Adjusted R² = 1 − (1−R²)(n−1)/(n−k−1) penalizes extra predictors (k = number of predictors excluding the intercept); it can fall when a variable adds noise.R² = 1 − 2.4/8.8 ≈ 0.727. With n=5, k=1: adjR² = 1 − (1−0.727)(4/3) ≈ 0.636. The penalty is large here because n is tiny relative to parameters.Var(β̂) = σ²(XᵀX)⁻¹, with σ̂² = SSR/(n−k−1). SE(β̂ⱼ) is the square root of the j-th diagonal element.
t = β̂ⱼ / SE(β̂ⱼ), df = n−k−1. Tests H₀: βⱼ = 0.F = (R²/k) / ((1−R²)/(n−k−1)). Tests H₀: all slopes = 0. Use the nested form F = [(SSRᵣ−SSRᵤ)/q] / [SSRᵤ/(n−k−1)] to compare a restricted vs unrestricted model.t = 0.8/0.30 ≈ 2.67. With df large, the two-sided p-value ≈ 0.008 < 0.05, so reject H₀ — the slope is statistically distinguishable from zero.E[y|x₀]: width ∝ √(Var(ŷ₀)).√(σ² + Var(ŷ₀)).VIFⱼ = 1/(1−Rⱼ²), where Rⱼ² is from regressing xⱼ on the other predictors. VIF > 10 (Rⱼ² > 0.9) is a common red flag.y = β₁x₁ + β₂x₂ + ε and you omit x₂, the bias on β̂₁ is β₂ · δ, where δ is the slope of x₂ regressed on x₁.
2 + 3·0.5 = 3.5 — a +1.5 bias. Sign of bias = sign(β₂)·sign(δ). Example (VIF). If Rⱼ² = 0.9 then VIF = 1/0.1 = 10: the SE of that coefficient is √10 ≈ 3.2× larger than with no collinearity.| Ridge (L2) | Lasso (L1) | |
|---|---|---|
| Penalty | SSR + λΣβⱼ² | SSR + λΣ|βⱼ| |
| Closed form | β = (XᵀX + λI)⁻¹Xᵀy | No closed form (LP/coordinate descent) |
| Effect | Shrinks toward 0, rarely exactly 0 | Drives some β exactly to 0 → sparsity / feature selection |
| Best when | Many correlated predictors all matter | Few predictors truly matter |
β = Xᵀy/(XᵀX+λ). If OLS β=1.0 and XᵀX=10, then λ=10 halves it to β = 10/20 = 0.5. Lasso instead would zero it out entirely once λ exceeds the soft-threshold cutoff.log(p/(1−p)) = β₀ + β₁x, so p = 1/(1 + e^−(β₀+β₁x)) (sigmoid, always in (0,1)).e^0.7 ≈ 2.01: each unit of x roughly doubles the odds of y=1. If at some x the log-odds = 0, then p = 1/(1+e⁰) = 0.5; at log-odds = −1, p = 1/(1+e¹) ≈ 0.269.Bias² + Variance + Irreducible σ². Simple models underfit (high bias); flexible models overfit (high variance — they memorize noise). The sweet spot minimizes their sum.
| Plot | Healthy | Pattern → problem |
|---|---|---|
| Residuals vs fitted | Random band around 0 | Curve → nonlinearity; funnel → heteroskedasticity |
| Q–Q plot | Points on the line | Heavy tails / skew → non-normal errors (matters for inference) |
| Residual ACF / vs order | No autocorrelation | Waves → serial correlation (time-series) |
| Leverage / Cook's distance | All small | Outlier or high-leverage point distorting the fit |
Rᵢ − R_f = α + β(R_m − R_f) + εCov/Var → β = 1.5 and α = R̄ᵢ − β·R̄_m = 0.5%. β=1.5 means the stock moves 1.5× the market (aggressive); the 0.5% α is its average excess return unexplained by market exposure. Always report Newey–West SEs on β since return residuals are autocorrelated.Var(ε)=σ²Ω (Ω≠I) leave OLS unbiased but no longer BLUE, and the textbook SE formula is wrong.β̂_GLS=(XᵀΩ⁻¹X)⁻¹XᵀΩ⁻¹y — equivalent to OLS on data premultiplied by Ω^(−1/2) (whitening). It is BLUE when Ω is known.wᵢ=1/Var(εᵢ), downweighting noisy points. Feasible GLS estimates Ω first, then plugs in.Var(εᵢ)=σ²xᵢ²: dividing the whole equation by xᵢ gives homoskedastic errors — that transformed OLS is WLS with wᵢ=1/xᵢ². On finance return panels, the safe move is OLS + HAC SEs rather than guessing Ω.y=X₁β₁+X₂β₂+ε, the OLS estimate β̂₂ is unchanged if you instead (1) regress y on X₁ and take residuals ỹ, (2) regress X₂ on X₁ and take residuals X̃₂, then (3) regress ỹ on X̃₂.ε~N(0,σ²I), minimizing SSR is identical to maximizing the Gaussian likelihood — OLS = MLE when errors are normal, which is why OLS inherits MLE efficiency and the exact t/F distributions.call = max(S−K, 0), put = max(K−S, 0). Long option = limited loss (premium), unlimited (call) / large (put) upside; short option = capped gain (premium), large loss.
C − P = S − Ke^(−rT). Intuition: long call + short put = synthetic forward (you end up owning the stock at K regardless of S). With dividend yield q: C − P = Se^(−qT) − Ke^(−rT).
| Greek | Measures ∂V/∂ | Call | Put | Scaling |
|---|---|---|---|---|
| Δ (delta) | spot S | 0→+1 | −1→0 | — |
| Γ (gamma) | delta vs S (∂²V/∂S²) | + | + | peaks ATM, ∝ 1/(σ√T) |
| ν (vega) | vol σ | + | + | ∝ √T |
| θ (theta) | time (decay) | −* | −* | ∝ 1/√T (ATM) |
| ρ (rho) | rate r | + | − | ∝ T |
C = S·N(d₁) − Ke^(−rT)·N(d₂), with d₁ = [ln(S/K) + (r + σ²/2)T] / (σ√T) and d₂ = d₁ − σ√T. Here N(d₁) = Δ of the call; N(d₂) = risk-neutral probability of finishing ITM.
P&L ≈ ½·Γ·S²·(ΔS/S)² + θ·dt, and BS sets θ ≈ −½·Γ·σ²·S² so that breakeven occurs exactly when realized vol = implied vol.
| Structure | Construction | View |
|---|---|---|
| Straddle | long call + long put, same K | big move, direction unknown (long vol) |
| Strangle | long OTM call + OTM put | cheaper long-vol, needs bigger move |
| Bull call (vertical) | long low-K call, short high-K call | moderately bullish, capped, cheaper |
| Butterfly | +1 low / −2 mid / +1 high call | pin near mid K; short vol, defined risk |
| Calendar | short near-dated, long far-dated, same K | long term-structure / forward vol; wants stillness now |
| Risk reversal | long OTM call, short OTM put | bullish + short skew (sells expensive put) |
N_var·(RV² − K_var²) — pure exposure to realized variance, no path-dependent delta-hedging needed. It is replicated by a static, strike-weighted portfolio of OTM options (weight ∝ 1/K²) plus a dynamic futures hedge.
C_ATM ≈ 0.4·S·σ·√T (from C ≈ S·σ·√T/√(2π), and 1/√(2π) ≈ 0.399). Hugely useful for fast mental pricing and for sanity-checking quotes.
0.4·S·√T (per 100% vol; divide by 100 for per-1%-vol).0.8·S·σ·√T.C ≥ max(0, S − Ke^(−rT)) — below this, buy call + short stock + lend Ke^(−rT) for a riskless profit.C ≤ S (call never worth more than the stock); P ≤ Ke^(−rT) (European put ≤ PV of strike). American put ≤ K.| Input ↑ | Call | Put |
|---|---|---|
| S | ↑ | ↓ |
| K | ↓ | ↑ |
| σ | ↑ | ↑ |
| T | ↑* | ↑/↓* |
| r | ↑ | ↓ |
100 − 95·e^(−0.05) ≈ 100 − 90.37 = 9.63. Any market call priced below 9.63 is a locked-in arb regardless of σ.σ_fwd = √[(σ₂²·T₂ − σ₁²·T₁)/(T₂ − T₁)].√[(0.22²·2 − 0.20²·1)/(2−1)] = √[(0.0968 − 0.0400)/1] = √0.0568 ≈ 23.8% — higher than either spot vol, the cost embedded in the calendar.σ_idx² ≈ ρ̄·(Σwᵢσᵢ)² + (1−ρ̄)·Σwᵢ²σᵢ²; with many names the second term vanishes, so σ_idx ≈ √ρ̄ · Σwᵢσᵢ.ρ̄ ≈ (σ_idx/Σwᵢσᵢ)² = (0.21/0.30)² = 0.49. A dispersion book is short this 0.49; if realized correlation comes in at 0.30 the index leg was overpriced and the trade wins.P = Σ C/(1+y)ᵗ + F/(1+y)ⁿ. Price and yield move inversely and convexly.
D_mod = D_mac/(1+y) → %ΔP ≈ −D_mod·Δy.D_mod·P·0.0001.%ΔP ≈ −D_mod·Δy + ½·Cvx·(Δy)²; convexity is always +ve for vanilla bonds (helps you both ways).F = S·e^((r−q)T) (q = dividend/convenience yield; for commodities add storage cost u: F = S·e^((r+u)T)). Basis = S − F (or F − S by convention); converges to 0 at expiry.
S = (1 − D(T_n)) / Σ τᵢ·D(Tᵢ) where D = discount factors.| Order | Behavior |
|---|---|
| Market | execute now at best available price; certain fill, uncertain price (slippage) |
| Limit | execute only at limit price or better; certain price, uncertain fill |
| Stop (stop-loss) | becomes a market order once a trigger price trades; protects/exits |
| Stop-limit | becomes a limit order at trigger — won't chase through gaps |
| IOC (immediate-or-cancel) | fill what you can right now, cancel the rest |
| FOK (fill-or-kill) | fill the entire size immediately or cancel all |
P_callable = P_straight − C; price is capped near the call price as yields fall.conversion ratio = par/conversion price; parity (conversion value) = ratio × stock. Trades as max(bond floor, parity) plus option time value.F = S · (1+r_d·τ)/(1+r_f·τ) — the forward locks in the rate differential; no arbitrage if you can borrow/lend/spot freely. Higher-rate currency trades at a forward discount.| Op | list | dict | set | tuple |
|---|---|---|---|---|
| index / key get | O(1) | O(1) avg | — | O(1) |
x in c | O(n) | O(1) avg | O(1) avg | O(n) |
| append / add | O(1) amort. | O(1) avg | O(1) avg | immutable |
| insert/pop at front | O(n) | — | — | — |
| pop end | O(1) | — | — | — |
| min / max | O(n) | O(n) | O(n) | O(n) |
list: ordered, mutable, contiguous; great for stacks (append/pop).tuple: immutable, hashable (usable as dict keys / set members).dict: insertion-ordered since 3.7; keys must be hashable.collections.deque (O(1) appendleft/popleft).seen = set(big_list) # build once: O(n) hits = [x for x in queries if x in seen] # each lookup O(1) # vs. `if x in big_list` -> O(n) per query -> O(n*m) total
for-append loops (the append is done in C) and read declaratively. Use them for transform/filter; reach for a loop when there are side effects.
squares = [x*x for x in range(10)] # list
evens = [x for x in range(10) if x % 2 == 0] # filter
pairs = [(i, j) for i in range(3) for j in range(3)] # nested
price_by = {sym: px for sym, px in quotes} # dict comp
uniq = {round(x, 2) for x in samples} # set comp
# ternary goes BEFORE the for:
clip = [x if x > 0 else 0 for x in vals]
total = sum(x*x for x in data) # O(1) extra memory, no list built
yield returns a generator: it computes values on demand and holds O(1) state. Lazy = constant memory for arbitrarily long / infinite streams.
def moving_sum(stream, k):
from collections import deque
win, s = deque(), 0
for x in stream:
win.append(x); s += x
if len(win) > k:
s -= win.popleft()
if len(win) == k:
yield s
for ms in moving_sum(prices, 20): # one value at a time
...
next(gen) pulls one value; StopIteration at the end.import itertools
def naturals():
n = 0
while True:
yield n; n += 1
list(itertools.islice(naturals(), 5)) # [0,1,2,3,4]None as the sentinel and build the real default inside.
def bad(x, acc=[]): # BUG: same list reused
acc.append(x); return acc
bad(1) # [1]
bad(2) # [1, 2] <-- leaked!
def good(x, acc=None):
if acc is None:
acc = []
acc.append(x); return acc
def runs once and binds defaults eagerly. The same logic applies to default {} and to class-level mutable attributes.== compares values, is compares identity (same object). Assignment never copies — it binds another name to the same object.
import copy a = [[1, 2], [3, 4]] b = a # alias: b is a -> True c = a.copy() # shallow: c is a -> False, but c[0] is a[0] -> True d = copy.deepcopy(a) # full clone: d[0] is a[0] -> False a[0].append(99) # also shows up in c (shared inner list), NOT in d
is None / is not None — never == None.is for value equality.x = y = [] makes both names point at one list. x.append(1) changes y too.# BEFORE: pairwise returns in a Python loop (slow)
rets = []
for i in range(1, len(px)):
rets.append(px[i] / px[i-1] - 1)
# AFTER: vectorized (no Python-level loop)
import numpy as np
px = np.asarray(px)
rets = px[1:] / px[:-1] - 1 # whole-array slice arithmetic
logret = np.diff(np.log(px)) # log returns, one call
x[x < 0] = 0.np.where(cond, a, b) is a vectorized ternary.z = (x - x.mean()) / x.std()
iterrows / apply(axis=1) in hot paths — use vectorized column ops and groupby.
import pandas as pd
df = pd.DataFrame({"sym": ["A","A","B"], "px": [10.,11.,20.], "qty": [5,3,2]})
df["notional"] = df.px * df.qty # vectorized, no loop
vwap = (df.groupby("sym")
.apply(lambda g: (g.px*g.qty).sum()/g.qty.sum())) # per-group
by_sym = df.groupby("sym").agg(tot=("notional","sum"),
n=("px","count"))
df["ret"] = df.groupby("sym").px.pct_change() # group-aware diff
.loc[mask, col] for label/boolean selection and safe assignment (avoids chained-assignment warnings).merge = SQL join; concat = stack frames.shift(1) for lagging — careful not to leak future data.df[df.a>0]["b"] = 1 may silently fail (chained indexing). Use df.loc[df.a>0, "b"] = 1.px, qty = 1234.5678, 50
f"{px:.2f}" # '1234.57' (2 decimals)
f"{px:,.2f}" # '1,234.57' (thousands sep)
f"{px:10.2f}" # ' 1234.57' (width 10, right-aligned)
f"{qty:<6}|" # '50 |' (left-align in width 6)
f"{0.0734:.1%}" # '7.3%'
f"{255:#x}" # '0xff'
f"{px=}" # 'px=1234.5678' (debug: name + value, 3.8+)
f"{sym:<4} {bid:>8.2f} {ask:>8.2f}"# Two-sum: O(n) using a hash map of value -> index
def two_sum(nums, target):
seen = {}
for i, x in enumerate(nums):
if target - x in seen:
return (seen[target - x], i)
seen[x] = i
return None
# Counter: frequencies, most common
from collections import Counter
c = Counter("abracadabra")
c.most_common(2) # [('a', 5), ('b', 2)]
# Reservoir sampling: uniform sample of k from a stream of unknown length
import random
def reservoir(stream, k):
res = []
for i, x in enumerate(stream):
if i < k:
res.append(x)
else:
j = random.randint(0, i) # inclusive
if j < k:
res[j] = x
return res
i (0-indexed, i≥k) is kept with prob k/(i+1), and earlier items survive each step with the complementary prob; induction shows every item ends with prob k/n.==; use a tolerance or math.isclose.
0.1 + 0.2 == 0.3 # False 0.1 + 0.2 # 0.30000000000000004 import math math.isclose(0.1 + 0.2, 0.3, rel_tol=1e-9) # True sum([0.1]*3) == 0.3 # False (0.30000000000000004) math.fsum([0.1]*3) == 0.3 # True (exact summation)
decimal.Decimal or integer cents — not floats.float('nan') == float('nan') is False; test with math.isnan.round(x, 6).import itertools as it from collections import defaultdict, deque, Counter, OrderedDict it.accumulate([1,2,3,4]) # 1,3,6,10 (running sum -> prefix sums) it.chain([1,2], [3,4]) # 1,2,3,4 (flatten) it.combinations([1,2,3], 2) # (1,2),(1,3),(2,3) it.product([0,1], repeat=2) # (0,0),(0,1),(1,0),(1,1) it.groupby(sorted(data)) # consecutive runs (sort first!) it.pairwise([1,2,3]) # (1,2),(2,3) (3.10+) dd = defaultdict(list) # auto-creates [] on missing key dd["A"].append(1) # no KeyError dq = deque(maxlen=3) # ring buffer; oldest drops on overflow
itertools.groupby groups only consecutive equal keys — sort by the key first or you'll get fragmented groups.list.append O(1) amortized; occasional resize is O(n) but rare.list.insert(0, x) / pop(0) O(n) — shifts everything; use deque.x in list/tuple O(n); x in set/dict O(1) average, O(n) worst (hash collisions).sorted() / list.sort() O(n log n), Timsort, stable."".join(parts) O(total) — never += strings in a loop (O(n²)).heapq push/pop O(log n); heapify O(n).a[i:j] O(j−i) (copies).import heapq
def topk(stream, k):
h = []
for x in stream:
if len(h) < k:
heapq.heappush(h, x)
elif x > h[0]:
heapq.heapreplace(h, x) # pop smallest, push x
return sorted(h, reverse=True)1; a size-1 dim is stretched without copying.(3,1) and (1,4) broadcast to (3,4) — the classic outer-op pattern. Incompatible: (3,) vs (4,) → error.arr.base is not None (view) and np.shares_memory(a,b).reshape returns a view when possible; ravel may view, flatten always copies.x[:,None] - x[None,:] broadcasts (n,1)-(1,n) to an (n,n) matrix. x[1:][0]=9 mutates x (view); x[x>0][0]=9 does not (copy).NaN, not positional pairing.a + b reorders to the union of labels. a.add(b, fill_value=0) stays label-aligned but fills labels missing on one side with 0 instead of NaN; for true positional math use a.values + b.values..loc, never chain. df[m]["c"]=1 writes to a temporary and may be lost (SettingWithCopyWarning); use df.loc[m,"c"]=1..loc is label-based (endpoint inclusive); .iloc is positional (endpoint exclusive)..copy(); reindexing/filtering may return a view or copy ambiguously.NaN. Reindex both to a common calendar before combining.[lambda: i for i in range(3)] all return 2; fix with a default arg lambda i=i: i.−5..256 and short string literals are interned, so is can accidentally match == — never rely on it (identity vs value lives in the copies card).sorted(..., key=...) sorts by the key; it is stable, so equal keys keep input order — chain sorts least-significant-key first.dict preserves insertion order (3.7+); equality ignores order, iteration respects it.a += b on a list mutates in place (extend); a = a + b rebinds to a new list.sorted(t, key=lambda x:(-x.px, x.t)) — pack the key as a tuple rather than sorting twice.void by_val(int x) { x = 9; } // local copy; caller unchanged
void by_ref(int& x) { x = 9; } // edits caller's object
void by_ptr(int* p) { if (p) *p = 9; } // may be null; deref carefully
int a = 1;
by_val(a); // a == 1
by_ref(a); // a == 9
by_ptr(&a); // a == 9
const — it documents intent, enables optimization, and lets callers pass temporaries/const objects.
int sum(const std::vector<int>& v); // won't modify v; avoids a copy
const int* p; // pointer to const int (can't change *p)
int* const q = &a; // const pointer to int (can't reseat q)
const int* const r = &a; // both fixed
struct Px {
double mid() const { return (bid + ask) / 2; } // const member fn
double bid, ask;
};
const int* p = "p is a pointer to a const int".const member function may not modify members (unless mutable) and can be called on const objects.const on mid(), you can't call it through a const Px& — const-correctness is viral; add it early.#include <memory>
auto u = std::make_unique<Order>(id); // sole owner; freed at scope exit
auto s = std::make_shared<Book>(); // ref-counted shared ownership
std::weak_ptr<Book> w = s; // non-owning observer (breaks cycles)
void f() {
auto p = std::make_unique<int>(42);
may_throw(); // if it throws, p is still freed -- no leak
} // destructor runs here
| Type | Ownership | Copyable |
|---|---|---|
unique_ptr | exclusive | no (move-only) |
shared_ptr | shared (refcount) | yes |
weak_ptr | none (observes) | yes |
shared_ptrs pointing at each other never reach refcount 0 — a leak. Break the cycle with weak_ptr.map/set are red-black trees (ordered, O(log n)); unordered_* are hash tables (O(1) average).
| Container | Lookup | Insert | Ordered? |
|---|---|---|---|
vector | O(1) index / O(n) find | O(1) amort. push_back | — |
map | O(log n) | O(log n) | sorted by key |
unordered_map | O(1) avg, O(n) worst | O(1) avg | no |
set | O(log n) | O(log n) | sorted |
deque | O(1) index | O(1) both ends | — |
std::vector<int> v{3, 1, 2};
v.push_back(4);
std::unordered_map<std::string, double> px;
px["AAPL"] = 190.5; // insert or update
auto it = px.find("AAPL"); // O(1) avg; it == px.end() if absent
if (it != px.end()) use(it->second);
vector — contiguous memory is cache-friendly and beats list/map in practice even when big-O looks worse. reserve(n) to avoid reallocations.std::vector<int> v{1, 2, 3};
for (int x : v) { /* copy each element */ }
for (int& x : v) { x *= 2; } // modify in place
for (const auto& x : v){ use(x); } // read-only, no copy
for (auto it = v.begin(); it != v.end(); ++it)
use(*it); // classic iterator loop
const auto& to avoid copying non-trivial elements.erase/push_back) during iteration can invalidate iterators.erase returns the next valid iterator: it = v.erase(it);vector::push_back may reallocate and invalidate all existing iterators/pointers/references into it.template <typename T>
T max_of(const T& a, const T& b) {
return (a < b) ? b : a;
}
max_of(3, 7); // T = int
max_of(2.5, 1.5); // T = double
template <typename It>
auto mean(It first, It last) {
using V = typename std::iterator_traits<It>::value_type;
V s{};
long n = 0;
for (; first != last; ++first) { s += *first; ++n; }
return s / n;
}
concepts constrain types and give clearer diagnostics. Watch types: for an integer range V=int, so s/n is integer division (truncates) — accumulate in double for a real mean.std::vector<int> make(); // returns a big vector std::vector<int> a = make(); // move (or RVO) -- no deep copy std::vector<int> b = a; // COPY: duplicates all elements std::vector<int> c = std::move(a); // MOVE: c takes a's buffer // after move, `a` is valid-but-unspecified (typically empty)
std::move is just a cast to an rvalue reference — it enables a move, it doesn't move anything itself.std::move a return value (it can disable RVO).void set(std::string s){ name_ = std::move(s); } — one move whether caller passes lvalue or rvalue.const T& to avoid a copy; for small/cheap types (int, double, pointers) pass by value.
double price(const std::string& sym); // no copy of the string double price(int id); // cheap; by value is fine void consume(std::vector<int> v); // takes a copy you'll own void inspect(const std::vector<int>& v); // read-only, no copy
const T& binds to temporaries too, so callers can pass literals.const& for big, by-value for small. Out/in-out params: non-const reference or pointer."new/malloc) is slower and must be managed.
void f() {
int x = 5; // stack: freed automatically
std::array<int, 64> buf; // stack: fixed size, fast
auto p = std::make_unique<int[]>(n); // heap: size known at runtime
} // x, buf freed here; p frees its heap block
new/delete.int* bad(){ int x=1; return &x; } — x is gone.vector/array store data contiguously; sequential access prefetches well — node-based structures chase pointers and miss cache.std::array to avoid heap in hot paths.vectors) often beats array-of-structs when you scan one field across many records — fewer cache lines touched.#include <algorithm>
#include <vector>
struct Quote { std::string sym; double px; };
std::vector<Quote> book = /* ... */;
// sort by price descending, ties by symbol ascending
std::sort(book.begin(), book.end(),
[](const Quote& a, const Quote& b) {
if (a.px != b.px) return a.px > b.px;
return a.sym < b.sym;
});
int thresh = 100;
auto n = std::count_if(book.begin(), book.end(),
[thresh](const Quote& q){ return q.px > thresh; }); // capture by value
[x] capture by value, [&x] by reference, [=]/[&] capture all.false for equal elements, never <=.[&]) a local that outlives the lambda is a dangling-reference bug; capture by value when the callable escapes the scope.64 bytes; a load pulls the whole line. Keep hot data contiguous and small so one line serves many accesses.std::vector is contiguous → linear scans are prefetcher-friendly; std::list/node-based maps chase pointers → a miss per node.vector vs a list: same big-O O(n), but the vector can be ~10× faster purely from cache locality and prefetching — why HFT code favors flat arrays.return T{...};).return std::move(local); on a return statement is an anti-pattern — it blocks elision and forces a move. Just return local;.T&& in a deduced template binds to lvalues and rvalues; std::forward<T>(x) preserves the original value category.std::move is just a cast to rvalue ref — it moves nothing by itself; the move happens in the chosen overload.template<class T> void push(T&& x){ v.emplace_back(std::forward<T>(x)); } — an lvalue arg copies, an rvalue arg moves, one function handles both with no extra temporary.| Container | Insert invalidates | Erase invalidates |
|---|---|---|
| vector | all if capacity grows, else from insert point | at/after erased |
| deque | all iterators (refs to ends survive) | all unless an end |
| list | none | only the erased |
| map/set | none | only the erased |
| unordered_* | iterators on rehash; refs survive | only the erased |
vector::reserve(n) up front avoids growth reallocations and keeps iterators stable.for(auto it=v.begin(); it!=v.end();) it = v.erase(it); — capture the returned iterator; never ++it after erasing it.Search, filter by domain, click to flip. Drill out loud — narrate reasoning, don't recite. ◆ reported = appears in real quant interviews.
Build intuition by visualizing and simulating — an option-payoff explorer, a distribution explorer, a market-making game, a linear-regression simulator, and an in-browser Python sandbox. Switch tools below.
Interactive option-payoff explorer — pick a structure or build your own legs, tune spot / vol / time, read breakeven, max-profit & max-loss live. Toggle net P&L (folds in the Black–Scholes premium) vs payoff only (gross intrinsic value — the textbook diagram).
Make it yours — add your own subjects, flashcards, and cheatsheet cards. Saved in this browser; they appear instantly across Browse / Flashcards / Cheatsheets.
Creates a new subject in both the cheatsheet and flashcard sections — they share one subject list.