MadiisAttendance
Data Selection & Filtering in Pandas 6 min read 1,094 words 81 sentences uni

Pandas Day 2 β€” Selecting, Slicing & Filtering Data

M Usman May 26, 2026
8 0 0 score
0%
Your Progress
0/36 sections
Reading Speed
0
words/min
Time Spent
00:00
Table of Contents

Quick Stats
Words: 1,094
Est. time: 6 min
Readability: Moderate

Word Cloud
electrocystoscope TAINTLESSLY SUCCESSARY non BOWINGLY fuchsias dhurry choree sledding affairs auletes topsmelts lwp uphasp zoolaters

πŸ“Œ Day 2: Data Selection & Filtering

Prerequisite Setup (run this first)

python
import pandas as pd
import numpy as np

# Create sample dataset for all examples
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace', 'Henry'],
    'Age': [25, 30, 35, 28, 22, 40, 33, 29],
    'City': ['NYC', 'LA', 'Chicago', 'NYC', 'LA', 'Chicago', 'NYC', 'LA'],
    'Salary': [50000, 60000, 70000, 55000, 48000, 80000, 65000, 52000],
    'Experience': [2, 5, 8, 3, 1, 12, 6, 4]
}, index=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])

print("Original DataFrame:")
print(df)

1️⃣ Select single/multiple columns using bracket notation

Single column (returns Series)

python
# Method 1: df['column_name']
names = df['Name']
print(names)
print(type(names))  # <class 'pandas.core.series.Series'>

Output:

text
A      Alice
B        Bob
C    Charlie
D      Diana
E        Eve
F      Frank
G      Grace
H      Henry
Name: Name, dtype: object

Multiple columns (returns DataFrame)

python
# Method: df[['col1', 'col2']] - Note the double brackets!
subset = df[['Name', 'Age', 'Salary']]
print(subset)

Output:

text
      Name  Age  Salary
A    Alice   25   50000
B      Bob   30   60000
C  Charlie   35   70000
D    Diana   28   55000
E      Eve   22   48000
F    Frank   40   80000
G    Grace   33   65000
H    Henry   29   52000

Why double brackets?

  • Single bracket [] returns Series

  • Double brackets [[]] returns DataFrame

  • df['Name'] vs df[['Name']] β€” first is Series, second is 1-column DataFrame

Quick tip: Select columns by data type

python
# Select only numeric columns
numeric_cols = df.select_dtypes(include=['int64', 'float64'])
print(numeric_cols)

2️⃣ Use .loc to filter rows by label range

What is .loc?

Label-based indexing β€” uses index labels (row names/numbers) and column names.

Basic syntax: df.loc[row_labels, column_labels]

Select specific rows by label

python
# Single row
print(df.loc['C'])  # Returns Series

# Multiple rows
print(df.loc[['A', 'C', 'E']])  # Returns DataFrame

Label range (INCLUSIVE of both ends!)

python
# Rows from 'B' to 'E' (includes both B and E)
print(df.loc['B':'E'])

Output:

text
      Name  Age     City  Salary  Experience
B      Bob   30       LA   60000           5
C  Charlie   35  Chicago   70000           8
D    Diana   28      NYC   55000           3
E      Eve   22       LA   48000           1

Select specific rows and columns

python
# Rows 'A' to 'D', only 'Name' and 'Salary' columns
print(df.loc['A':'D', ['Name', 'Salary']])

Output:

text
      Name  Salary
A    Alice   50000
B      Bob   60000
C  Charlie   70000
D    Diana   55000

All rows, specific columns

python
print(df.loc[:, ['Name', 'City']])  # : means all rows

3️⃣ Use .iloc to slice rows and columns by position

What is .iloc?

Integer position-based indexing β€” works like Python list slicing.

Basic syntax: df.iloc[row_positions, column_positions]

Select by integer positions

python
# First 3 rows (positions 0,1,2)
print(df.iloc[0:3])  # Note: end exclusive (like list slicing)

Output (rows 0,1,2):

text
      Name  Age     City  Salary  Experience
A    Alice   25      NYC   50000           2
B      Bob   30       LA   60000           5
C  Charlie   35  Chicago   70000           8

Row and column slicing

python
# Rows 2-5 (positions 2 to 4 inclusive), columns 0-2 (Name, Age, City)
print(df.iloc[2:5, 0:3])

Output:

text
      Name  Age     City
C  Charlie   35  Chicago
D    Diana   28      NYC
E      Eve   22       LA

Specific row and column positions

python
# Row at position 3, column at position 4
print(df.iloc[3, 4])  # Output: 3 (Experience of Diana)

# Multiple specific rows and columns
print(df.iloc[[0, 2, 5], [0, 3]])  # Rows 0,2,5 and columns 0,3

All rows, specific column positions

python
# All rows, columns 1 and 3 (Age and Salary)
print(df.iloc[:, [1, 3]])

4️⃣ Filter DataFrame with multiple conditions (&, |)

IMPORTANT: Use & for AND, | for OR, and wrap each condition in ()

Single condition (review)

python
# Age > 30
older = df[df['Age'] > 30]
print(older)

Multiple conditions with AND (&)

python
# Age between 25 AND 35, AND Salary > 55000
filtered = df[(df['Age'] >= 25) & (df['Age'] <= 35) & (df['Salary'] > 55000)]
print(filtered)

Output:

text
      Name  Age     City  Salary  Experience
B      Bob   30       LA   60000           5
C  Charlie   35  Chicago   70000           8
G    Grace   33      NYC   65000           6

Multiple conditions with OR (|)

python
# Either from NYC OR from LA
nyc_or_la = df[(df['City'] == 'NYC') | (df['City'] == 'LA')]
print(nyc_or_la)

Complex combination (AND + OR)

python
# (Age > 30 AND Salary > 60000) OR (Experience > 5)
complex_filter = df[((df['Age'] > 30) & (df['Salary'] > 60000)) | (df['Experience'] > 5)]
print(complex_filter)

5️⃣ Use .isin() and .between() for complex filters

.isin() β€” Check if values are in a list

python
# Filter cities in a list
cities = ['NYC', 'Chicago']
filtered_cities = df[df['City'].isin(cities)]
print(filtered_cities)

Output:

text
      Name  Age     City  Salary  Experience
A    Alice   25      NYC   50000           2
C  Charlie   35  Chicago   70000           8
D    Diana   28      NYC   55000           3
G    Grace   33      NYC   65000           6

Inverse with ~ (NOT)

python
# All cities EXCEPT NYC and LA
not_nyc_la = df[~df['City'].isin(['NYC', 'LA'])]
print(not_nyc_la)  # Only Chicago

.between() β€” Range check (inclusive)

python
# Age between 25 and 35 (inclusive)
age_filter = df[df['Age'].between(25, 35)]
print(age_filter)

# Salary between 50000 and 65000
salary_filter = df[df['Salary'].between(50000, 65000)]
print(salary_filter[['Name', 'Salary']])

Combining .isin() and .between()

python
# Age 25-35 AND City is NYC or LA
result = df[
    df['Age'].between(25, 35) & 
    df['City'].isin(['NYC', 'LA'])
]
print(result)

🎯 Complete Hands-on Checklist (Run this)

python
import pandas as pd
import numpy as np

# Create dataset
df = pd.DataFrame({
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Desk', 'Chair', 'Lamp'],
    'Category': ['Electronics', 'Accessories', 'Accessories', 'Electronics', 'Furniture', 'Furniture', 'Accessories'],
    'Price': [1200, 25, 75, 300, 450, 200, 35],
    'Stock': [10, 100, 50, 15, 5, 20, 60],
    'Rating': [4.5, 4.2, 4.3, 4.7, 4.1, 4.4, 4.0]
}, index=['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7'])

print("="*50)
print("1. Column Selection")
print("="*50)
print("Single column (Price):")
print(df['Price'])
print("\nMultiple columns (Product, Price):")
print(df[['Product', 'Price']])

print("\n" + "="*50)
print("2. .loc Label-based Filtering")
print("="*50)
print("Rows P2 to P4:")
print(df.loc['P2':'P4'])
print("\nRows P1 to P3, columns Product and Rating:")
print(df.loc['P1':'P3', ['Product', 'Rating']])

print("\n" + "="*50)
print("3. .iloc Position-based Filtering")
print("="*50)
print("First 3 rows:")
print(df.iloc[0:3])
print("\nRows 2-4, columns 0 and 2:")
print(df.iloc[1:4, [0, 2]])

print("\n" + "="*50)
print("4. Multiple Conditions (&, |)")
print("="*50)
print("Price > 100 AND Stock < 30:")
print(df[(df['Price'] > 100) & (df['Stock'] < 30)])

print("\n" + "="*50)
print("5. .isin() and .between()")
print("="*50)
print("Category in [Electronics, Furniture]:")
print(df[df['Category'].isin(['Electronics', 'Furniture'])])
print("\nPrice between 50 and 400:")
print(df[df['Price'].between(50, 400)])

❌ Common Mistakes & Solutions

MistakeWhy it failsCorrect way
df[df['Age'] > 30 & df['Salary'] > 50000]Missing parenthesesdf[(df['Age'] > 30) & (df['Salary'] > 50000)]
df.loc[1:3] when index is not numericLabel vs position confusionUse df.iloc[1:3] for positions
df['Age', 'Name']Wrong syntax for multiple columnsdf[['Age', 'Name']]
df[df['City'] == 'NYC' or 'LA']or doesn't work with Seriesdf[df['City'].isin(['NYC', 'LA'])]
df.loc['A':'D', 'Name']Works but returns SeriesUse df.loc['A':'D', ['Name']] for DataFrame

πŸ“ Practice Exercises

  1. From the products DataFrame, select all rows where:

    • Price > 100 AND Stock > 10

  2. Use .iloc to get rows 3-5 and columns 1, 3, 4

  3. Filter products where Category is 'Accessories' using .isin()

  4. Use .between() to find products with Rating between 4.2 and 4.6

  5. Combine conditions: (Price < 100 OR Category == 'Electronics') AND Stock > 20


βœ… Summary of Day 2

MethodPurposeExample
df['col']Single columndf['Age']
df[['col1','col2']]Multiple columnsdf[['Name','Age']]
.loc[]Label-based selectiondf.loc['A':'D', 'Name']
.iloc[]Position-based selectiondf.iloc[0:3, 0:2]
&, |Multiple conditionsdf[(df['Age']>25) & (df['Salary']<60000)]
.isin()Match any in listdf[df['City'].isin(['NYC','LA'])]
.between()Range filterdf[df['Age'].between(25,35)]
Discussion (0)
Login to comment
Dictionary

Add New Word

Dictionary Words
My Notes
Highlights
Select text and click highlight to save
My Vocabulary
Quick Quiz
Settings
Reading Analytics
Today's reading: 0 min
Total read time: 0 min
Words learned: 0
Streak: 0 days
AI Summary

Generating summary...