📌 Day 1 Goal Breakdown
Install pandas and verify version
Create Series from list and dictionary
Build DataFrame from CSV / dict with custom index
Explore shape, columns, dtypes attributes
Use head(), tail(), sample() to inspect data
1️⃣ Install pandas and verify version
How to install
Open your terminal (Command Prompt, PowerShell, or terminal in VS Code/PyCharm).
pip install pandasIf you use Jupyter or Anaconda:
conda install pandasVerify installation & version
import pandas as pd print(pd.__version__)
Example output:2.0.3 (yours may be newer)
✅
pdis the standard alias for pandas — almost everyone uses it.
2️⃣ Create Series from list and dictionary
What is a Series?
A Series is like a single column of data — it has index (labels) and values.
Think of it as a Python dictionary + list combined.
From a list
import pandas as pd # List → Series (default index 0,1,2,...) s1 = pd.Series([10, 20, 30, 40]) print(s1)
Output:
0 10 1 20 2 30 3 40 dtype: int64
From a dictionary (keys become index)
s2 = pd.Series({'a': 100, 'b': 200, 'c': 300}) print(s2)
Output:
a 100 b 200 c 300 dtype: int64
How it works
pandas automatically aligns data with index.
If index is not provided, it uses 0-based integers.
Dictionary keys become the index.
3️⃣ Build DataFrame from CSV/dict with custom index
DataFrame = multiple Series sharing the same index (like a spreadsheet)
From dictionary
data_dict = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['NYC', 'LA', 'Chicago'] } df = pd.DataFrame(data_dict) print(df)
Output:
Name Age City 0 Alice 25 NYC 1 Bob 30 LA 2 Charlie 35 Chicago
Add custom index
df = pd.DataFrame(data_dict, index=['row1', 'row2', 'row3']) print(df)
Output:
Name Age City row1 Alice 25 NYC row2 Bob 30 LA row3 Charlie 35 Chicago
From CSV file
Suppose you have data.csv:
Name,Age,City Alice,25,NYC Bob,30,LA Charlie,35,Chicago
df_csv = pd.read_csv('data.csv') print(df_csv)
⚠️ Make sure the CSV file is in the same folder as your Python script, or provide full path.
Set custom index while loading CSV
df_csv = pd.read_csv('data.csv', index_col=0) # first column becomes index
4️⃣ Explore shape, columns, dtypes attributes
Using the DataFrame we created:
df = pd.DataFrame(data_dict) # Number of rows and columns print(df.shape) # Output: (3, 3) # Column names print(df.columns) # Output: Index(['Name', 'Age', 'City'], dtype='object') # Data type of each column print(df.dtypes)
Output for dtypes:
Name object Age int64 City object dtype: object
object= string/textint64= integer numbers
Why this matters
shapetells you dataset sizecolumnslets you access column namesdtypeshelps detect if numeric column is wrongly read as string
5️⃣ Use head(), tail(), sample() to inspect data
Let’s make a bigger DataFrame for demonstration:
import numpy as np # Create 20 rows of dummy data big_data = { 'ID': range(1, 21), 'Score': np.random.randint(50, 100, 20) } df_big = pd.DataFrame(big_data)
head() — first 5 rows (default)
print(df_big.head())
Output (example):
ID Score 0 1 78 1 2 92 2 3 65 3 4 88 4 5 73
head(10) — first 10 rows
print(df_big.head(10))
tail() — last 5 rows
print(df_big.tail())
sample() — random rows
print(df_big.sample(3)) # 3 random rows print(df_big.sample(frac=0.1)) # 10% of rows (2 rows here)
Why these are useful
head(): quick sanity checktail(): check recent/last entriessample(): random inspection (good for large datasets)
🧪 Practice Exercises (try these)
Create a Series from
[5, 10, 15, 20]with custom index['a','b','c','d'].Build a DataFrame from this dictionary:
{'Product': ['A','B'], 'Price': [100, 200], 'Stock': [10, 20]}
Set index to['item1', 'item2'].For that DataFrame, print:
shape
column names
data types
Load any small CSV (or create one) and use
.head(3),.tail(2),.sample(2).
❌ Common mistakes & how to avoid
| Mistake | Fix |
|---|---|
Forgetting import pandas as pd | Always write it first |
| Using wrong file path for CSV | Use r'C:\data\file.csv' or os.path.join() |
| Assuming default index is useful | Explicitly set index if needed |
| Confusing Series vs DataFrame | Series = 1D, DataFrame = 2D |
✅ Summary of Day 1
You now know:
Install pandas & check version
Create Series from list/dict
Create DataFrame from dict/CSV with custom index
Inspect DataFrame using
.shape,.columns,.dtypesView data with
.head(),.tail(),.sample()
thnks