Step 1: Python for Data Science
Step 1: Python for Data Science
Before building models, you must be able to load, clean, and visualize data. The โHoly Trinityโ of Python libraries for this are NumPy, Pandas, and Matplotlib.
๐ ๏ธ Code Example: Data Manipulation
This script demonstrates how to load a dataset, perform basic analysis, and visualize the results.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 1. Create a synthetic dataset (or load a CSV)
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [25, 30, 35, 40, 22],
'Salary': [50000, 60000, 120000, 80000, 45000]
}
df = pd.DataFrame(data)
# 2. Vectorized Math with NumPy
# Add a 10% bonus to everyone
df['Bonus_Salary'] = df['Salary'] * 1.10
# 3. Filtering and Aggregation
high_earners = df[df['Salary'] > 70000]
avg_age = np.mean(df['Age'])
print(f"Average Age: {avg_age}")
print("High Earners:")
print(high_earners)
# 4. Simple Visualization
df.plot(kind='bar', x='Name', y='Salary', color='skyblue')
plt.title("Employee Salaries")
plt.ylabel("Salary ($)")
plt.show()๐ฅ Your Goal
- Install
pandasandnumpy. - Load a real CSV from Kaggle.
- Calculate the mean and median of one column.
- Plot a line graph of the data.