Meetup summary

2025-06-27 - Intro to probability

None.

Agenda:

This should serve as the basic prerequisite for future probability-oriented sessions. (For example, this is mostly prerequisite for information theory and machine learning applications.) The current list is provisional and will have to be trimmed down based on interest. We’ll define important terms and give a convincing derivation/“physicists’s proof” for interesting results.

  • Event space
  • Axioms of probability
  • Conditional probability
  • Bayes’ Rule
  • Chain rule of probability
  • Conditional chain rule, Bayes’ rule, etc. (You should be able to move conditionals back and forth arbitrarily).
  • Independence
    • Of two events. (Not defined by denominator due to division by zero, but equivalent in the happy case.)
    • Pairwise independence of a set of events
    • Mutual independence of a set (this is typically what is meant by “independence” if not otherwise stated)
    • Conditional independence
  • Law of total probability (partitioning)
  • Reminder: Counting forms the basis for all generalizations of probability. Start with uniform distribution and count events (implies equal weighting). We won’t go into counting rules today, as we’re already experts in that area. 😎
  • Discrete RVs
    • PMF
    • CDF
  • Continuous/mixed RVs
    • PDF (this is defined as a limiting case—essentially the difference quotient of the CDF)
    • CDF (this is spelled the same way as in the discrete case).
    • Can always spell a mixed RV as a PDF by using Dirac delta “distribution”.
    • Rule of thumb: When you’re deriving some complex/non-intuitive property or trying to prove something, start with CDF which allows you to reason in “probability space”. You can always differentiate to get PDFs.
  • Multivariate RVs
    • Joint distribution (PMF/PDF/CDF)
    • Marginal PDF
    • Marginal CDF
  • Expectation
    • Formalization of a “mean”
    • Definition for discrete RV
    • Definition for continuous RV
    • Single RV linearity (multiply/add constants)
    • LOTUS
    • Multiple RV linearity (requires joint distribution; invoke LOTUS)
    • Conditional expectation (should be though of as a function of the conditioned RV)
      • Law of total expectation
  • Variance
    • Definition
    • Expansion due to linearity of expectation
  • Covariance
    • Definition
    • Linear expansion
    • Rules for sums/scales of covariance. (This gives rise to sums of variance, whether or not independent.)
  • Derived RVs
    • Single-variable method of transformations
    • Multivariate MOT (analogous but uses Jacobian)
    • Special case: sum of random variables (convolution). Works for both discrete and continuous RVs.
  • Foundational processes/distributions
    • Bernoulli process/RV, binomial RV, geometric RV, negative binomial RV, etc.
    • Multinomial (categorical) process, multinoulli.
    • Poisson process (limiting case of binomial), Poisson distribution, exponential distribution, Erlang/Gamma distribution
    • Gaussian distribution (different limiting case of binomial, but the derivation is long and we won’t get into it today; also arises from the CLT)
  • Moment generating functions
    • Equivalent to a two-sided Laplace transform, so it does not exist when RV doesn’t have finite moments.
  • Characteristic functions
    • Equivalent to Fourier transform, so it always exists but cannot be used to easily recovery moments from the series expansion.
  • Basic estimators
    • Definition
    • Estimator bias
    • Estimator variance
    • Estimator MSE
    • Sample mean
    • “Naive” sample variance
    • Unbiased sample variance
  • Foundational inequalities
    • Union bound
    • Markov inequalitiy
    • Chebychev inequality
    • Cauchy-Schwarz inquality (for expectations, by analogy with coordinate-free vector version)
    • Jensen’s inequality (I don’t know a general proof—this will just be an intuitive argument)
    • Gibbs’ inequalitiy (preview for information theory—won’t drill into entropy/information yet)
  • Inference preview (not planning to go deep here; will need to dedicate future sessions to particular areas)
    • Foundational inequalities (union bound, Markov, Chebychev)
    • Classical vs Bayesian perspective in a nutshell (raw MLE vs explicit priors, underlying parameter is an (unknown) constant vs RV)
    • Conjugate priors (e.g., beta-binomial)