Random Variables and Distributions
📊 Random Variables & Distributions
Random variables and probability distributions are the mathematical tools we use to model data and predict outcomes. They allow us to move from simple events to complex, data-driven simulations.
🟢 Level 1: Random Variables ()
A random variable is a function that maps the outcomes of a random process to real numbers. We categorize them into two main types:
1. Discrete Random Variables
These take on a finite or countably infinite set of distinct values (e.g., the number of requests to a server in a minute, the number of heads in 10 coin flips).
- Probability Mass Function (PMF): .
2. Continuous Random Variables
These can take on any value within a given range or interval (e.g., the time it takes for a page to load, the weight of a package).
- Probability Density Function (PDF): , where .
🟡 Level 2: Essential Distributions
In software engineering, a few key distributions appear repeatedly when modeling system behavior and data:
3. Bernoulli and Binomial
- Bernoulli (): A single trial with two outcomes (Success/Failure).
- Binomial (): The number of successes in independent Bernoulli trials.
- Example: The number of successful API calls out of 100 attempts.
4. Poisson Distribution ()
Models the number of events occurring in a fixed interval of time or space, given a constant average rate .
- Example: The number of logins per second on a web server.
5. Normal (Gaussian) Distribution ()
The “bell curve” defined by its mean () and variance (). Due to the Central Limit Theorem, the sum of many independent random variables tends toward a normal distribution.
import numpy as np
import matplotlib.pyplot as plt
# Generating data from a Normal Distribution
mu, sigma = 0, 0.1
s = np.random.normal(mu, sigma, 1000)
# Calculating mean and variance
print(f"Sample Mean: {np.mean(s):.4f}")
print(f"Sample Variance: {np.var(s):.4f}")🔴 Level 3: Expectations and Moments
6. Expected Value ()
The “long-run average” of a random variable. It is the center of mass of the distribution.
- Discrete: .
- Continuous: .
7. Variance and Standard Deviation
- Variance (): Measures the spread of the distribution: .
- Standard Deviation (): The square root of the variance, expressed in the same units as the data.
8. Covariance and Correlation
These measure the relationship between two random variables and :
- Covariance: .
- Correlation (): A normalized version of covariance that ranges from -1 to 1.