Step 1: Python for Data Science

Before building models, you must be able to load, clean, and visualize data. The “Holy Trinity” of Python libraries for this are NumPy, Pandas, and Matplotlib.

🛠️ Code Example: Data Manipulation

This script demonstrates how to load a dataset, perform basic analysis, and visualize the results.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# 1. Create a synthetic dataset (or load a CSV)
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 40, 22],
    'Salary': [50000, 60000, 120000, 80000, 45000]
}
df = pd.DataFrame(data)

# 2. Vectorized Math with NumPy
# Add a 10% bonus to everyone
df['Bonus_Salary'] = df['Salary'] * 1.10

# 3. Filtering and Aggregation
high_earners = df[df['Salary'] > 70000]
avg_age = np.mean(df['Age'])

print(f"Average Age: {avg_age}")
print("High Earners:")
print(high_earners)

# 4. Simple Visualization
df.plot(kind='bar', x='Name', y='Salary', color='skyblue')
plt.title("Employee Salaries")
plt.ylabel("Salary ($)")
plt.show()

🥅 Your Goal

Install pandas and numpy.
Load a real CSV from Kaggle.
Calculate the mean and median of one column.
Plot a line graph of the data.