Skip to content

Descriptive and Inferential Statistics

📊 Statistics: From Description to Inference

Statistics is the science of collecting, analyzing, and interpreting data. It is divided into two main branches: Descriptive Statistics, which summarizes the features of a dataset, and Inferential Statistics, which allows us to draw conclusions about a population based on a sample.


🟢 Level 1: Descriptive Statistics (Summarizing Data)

Descriptive statistics provides a quick overview of the “shape” and “center” of your data.

1. Central Tendency

  • Mean (μ\mu): The average value. highly sensitive to outliers.
  • Median: The middle value when data is sorted. Robust to outliers (ideal for income data).
  • Mode: The most frequent value. useful for categorical data.

2. Dispersion (Spread)

  • Variance (σ2\sigma^2): The average squared deviation from the mean.
  • Standard Deviation (σ\sigma): The square root of variance. It describes how “spread out” the data is in the original units.
  • Interquartile Range (IQR): The difference between the 75th percentile (Q3) and the 25th percentile (Q1). It is used to identify outliers.
import pandas as pd

data = [10, 12, 12, 13, 15, 16, 18, 20, 100] # 100 is an outlier
df = pd.DataFrame(data, columns=['Value'])

print(f"Mean: {df['Value'].mean()}")
print(f"Median: {df['Value'].median()}")
print(f"Std Dev: {df['Value'].std()}")

🟡 Level 2: Inferential Statistics (Making Predictions)

Inferential statistics uses probability to determine how likely it is that a pattern observed in a sample also exists in the larger population.

3. Population vs. Sample

  • Population: The entire group you want to study (e.g., all visitors to your website).
  • Sample: A subset of the population used for analysis (e.g., 1,000 random visitors).

4. Confidence Intervals (CI)

A confidence interval provides a range of values that is likely to contain the true population parameter (e.g., “We are 95% confident the average load time is between 200ms and 220ms”). CI=xˉ±Zσn\text{CI} = \bar{x} \pm Z \frac{\sigma}{\sqrt{n}}


🔴 Level 3: Hypothesis Testing and A/B Testing

5. The Hypothesis Testing Framework

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics.

  • Null Hypothesis (H0H_0): The “status quo”—assumes no effect or difference.
  • Alternative Hypothesis (H1H_1): The claim we want to test—assumes there is an effect.

6. P-Values and Significance (α\alpha)

The p-value is the probability of observing our results (or more extreme results) if the null hypothesis is true.

  • If p<αp < \alpha (usually 0.05), we reject the null hypothesis.
  • If pαp \ge \alpha, we fail to reject the null hypothesis.

7. Common Tests

  • T-Test: Compares the means of two groups (e.g., does Version A of a button result in more clicks than Version B?).
  • Chi-Square Test: Used for categorical data (e.g., is there a relationship between user location and device type?).