Meetup summary
2025-07-11 - Intro to probability - part 2
Recommended reading:
None.
Agenda:
The meetup date and agenda are both tentative, but the idea is to pick up the intro to probability stuff from last time.
- Expectation
- Formalization of a “mean”
- Definition for discrete RV
- Definition for continuous RV
- Single RV linearity (multiply/add constants)
- LOTUS
- Multiple RV linearity (requires joint distribution; invoke LOTUS)
- Conditional expectation (should be though of as a function of the conditioned RV)
- Law of total expectation
- Of independent RVs
- Variance
- Definition
- Expansion due to linearity of expectation
- Law of total variance
- Of independent RVs (derived via the general case of covariance)
- Covariance
- Definition
- Linear expansion
- Rules for sums/scales of covariance. (This gives rise to sums of variance, whether or not independent.)
- Covariance inequality (relies on Cauchy-Schwarz)
- Correlation (just standardize and take covariance)
Derived RVsSingle-variable method of transformationsMultivariate MOT (analogous but uses Jacobian)Special case: sum of random variables (convolution). Works for both discrete and continuous RVs.
Foundational processes/distributionsBernoulli process/RV, binomial RV, geometric RV, negative binomial RV, etc.Multinomial (categorical) process, multinoulli.Poisson process (limiting case of binomial), Poisson distribution, exponential distribution, Erlang/Gamma distributionGaussian distribution (different limiting case of binomial, but the derivation is long and we won’t get into it today; also arises from the CLT)
Moment generating functionsEquivalent to a two-sided Laplace transform, so it does not exist when RV doesn’t have finite moments.
Characteristic functionsEquivalent to Fourier transform, so it always exists but cannot be used to easily recovery moments from the series expansion.
Basic estimatorsDefinitionEstimator biasEstimator varianceEstimator MSESample mean“Naive” sample varianceUnbiased sample variance
- Foundational inequalities
Union boundMarkov inequalityChebychev inequality- Cauchy-Schwarz inquality (for expectations, by analogy with coordinate-free vector version)
Jensen’s inequality (I don’t know a general proof—this will just be an intuitive argument)Gibbs’ inequalitiy (preview for information theory—won’t drill into entropy yet)
Inference preview (not planning to go deep here; will need to dedicate future sessions to particular areas)Classical vs Bayesian perspective in a nutshell (raw MLE vs explicit priors, underlying parameter is an (unknown) constant vs RV)Conjugate priors (e.g., beta-binomial)
Notes
We went through expectation, variance, and covariance. This was review for a few people but we wanted to get everybody up to speed. We skipped ahead to the Cauchy-Schwarz inequality in order to derive the covariance inequality, starting with a discrete dot product in 2D/3D and then abstracting to higher dimensions and finally to continuous vector spaces. It can of course be demonstrated more directly in the special case where you define the inner product to be the product of two random variables over some joint distribution.
tags: