Skip to content

Random Variables and Distributions

📊 Random Variables & Distributions

Random variables and probability distributions are the mathematical tools we use to model data and predict outcomes. They allow us to move from simple events to complex, data-driven simulations.


🟢 Level 1: Random Variables (XX)

A random variable is a function that maps the outcomes of a random process to real numbers. We categorize them into two main types:

1. Discrete Random Variables

These take on a finite or countably infinite set of distinct values (e.g., the number of requests to a server in a minute, the number of heads in 10 coin flips).

  • Probability Mass Function (PMF): P(X=x)P(X = x).

2. Continuous Random Variables

These can take on any value within a given range or interval (e.g., the time it takes for a page to load, the weight of a package).

  • Probability Density Function (PDF): f(x)f(x), where P(aXb)=abf(x)dxP(a \le X \le b) = \int_a^b f(x) dx.

🟡 Level 2: Essential Distributions

In software engineering, a few key distributions appear repeatedly when modeling system behavior and data:

3. Bernoulli and Binomial

  • Bernoulli (pp): A single trial with two outcomes (Success/Failure).
  • Binomial (n,pn, p): The number of successes in nn independent Bernoulli trials.
    • Example: The number of successful API calls out of 100 attempts.

4. Poisson Distribution (λ\lambda)

Models the number of events occurring in a fixed interval of time or space, given a constant average rate λ\lambda.

  • Example: The number of logins per second on a web server.

5. Normal (Gaussian) Distribution (μ,σ2\mu, \sigma^2)

The “bell curve” defined by its mean (μ\mu) and variance (σ2\sigma^2). Due to the Central Limit Theorem, the sum of many independent random variables tends toward a normal distribution. f(x)=1σ2πe12(xμσ)2f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}

import numpy as np
import matplotlib.pyplot as plt

# Generating data from a Normal Distribution
mu, sigma = 0, 0.1 
s = np.random.normal(mu, sigma, 1000)

# Calculating mean and variance
print(f"Sample Mean: {np.mean(s):.4f}")
print(f"Sample Variance: {np.var(s):.4f}")

🔴 Level 3: Expectations and Moments

6. Expected Value (E[X]E[X])

The “long-run average” of a random variable. It is the center of mass of the distribution.

  • Discrete: E[X]=xP(X=x)E[X] = \sum x P(X=x).
  • Continuous: E[X]=xf(x)dxE[X] = \int x f(x) dx.

7. Variance and Standard Deviation

  • Variance (Var(X)\text{Var}(X)): Measures the spread of the distribution: E[(XE[X])2]E[(X - E[X])^2].
  • Standard Deviation (σ\sigma): The square root of the variance, expressed in the same units as the data.

8. Covariance and Correlation

These measure the relationship between two random variables XX and YY:

  • Covariance: Cov(X,Y)=E[(XE[X])(YE[Y])]\text{Cov}(X, Y) = E[(X - E[X])(Y - E[Y])].
  • Correlation (ρ\rho): A normalized version of covariance that ranges from -1 to 1.