Meetup summary

2025-07-11 - Intro to probability - part 2

None.

Agenda:

The meetup date and agenda are both tentative, but the idea is to pick up the intro to probability stuff from last time.

  • Expectation
    • Formalization of a “mean”
    • Definition for discrete RV
    • Definition for continuous RV
    • Single RV linearity (multiply/add constants)
    • LOTUS
    • Multiple RV linearity (requires joint distribution; invoke LOTUS)
    • Conditional expectation (should be though of as a function of the conditioned RV)
      • Law of total expectation
    • Of independent RVs
  • Variance
    • Definition
    • Expansion due to linearity of expectation
    • Law of total variance
    • Of independent RVs (derived via the general case of covariance)
  • Covariance
    • Definition
    • Linear expansion
    • Rules for sums/scales of covariance. (This gives rise to sums of variance, whether or not independent.)
    • Covariance inequality (relies on Cauchy-Schwarz)
    • Correlation (just standardize and take covariance)
  • Derived RVs
    • Single-variable method of transformations
    • Multivariate MOT (analogous but uses Jacobian)
    • Special case: sum of random variables (convolution). Works for both discrete and continuous RVs.
  • Foundational processes/distributions
    • Bernoulli process/RV, binomial RV, geometric RV, negative binomial RV, etc.
    • Multinomial (categorical) process, multinoulli.
    • Poisson process (limiting case of binomial), Poisson distribution, exponential distribution, Erlang/Gamma distribution
    • Gaussian distribution (different limiting case of binomial, but the derivation is long and we won’t get into it today; also arises from the CLT)
  • Moment generating functions
    • Equivalent to a two-sided Laplace transform, so it does not exist when RV doesn’t have finite moments.
  • Characteristic functions
    • Equivalent to Fourier transform, so it always exists but cannot be used to easily recovery moments from the series expansion.
  • Basic estimators
    • Definition
    • Estimator bias
    • Estimator variance
    • Estimator MSE
    • Sample mean
    • “Naive” sample variance
    • Unbiased sample variance
  • Foundational inequalities
    • Union bound
    • Markov inequality
    • Chebychev inequality
    • Cauchy-Schwarz inquality (for expectations, by analogy with coordinate-free vector version)
    • Jensen’s inequality (I don’t know a general proof—this will just be an intuitive argument)
    • Gibbs’ inequalitiy (preview for information theory—won’t drill into entropy yet)
  • Inference preview (not planning to go deep here; will need to dedicate future sessions to particular areas)
    • Classical vs Bayesian perspective in a nutshell (raw MLE vs explicit priors, underlying parameter is an (unknown) constant vs RV)
    • Conjugate priors (e.g., beta-binomial)

Notes

We went through expectation, variance, and covariance. This was review for a few people but we wanted to get everybody up to speed. We skipped ahead to the Cauchy-Schwarz inequality in order to derive the covariance inequality, starting with a discrete dot product in 2D/3D and then abstracting to higher dimensions and finally to continuous vector spaces. It can of course be demonstrated more directly in the special case where you define the inner product to be the product of two random variables over some joint distribution.