Pandas Day 2: Column Selection, .loc, .iloc & Filtering Techniques

Table of Contents

📌 Day 2: Data Selection & Filtering H2

Prerequisite Setup (run this first) H3

1️⃣ Select single/multiple columns using bra… H2

Single column (returns Series) H3

Multiple columns (returns DataFrame) H3

Why double brackets? H3

Quick tip: Select columns by data type H3

2️⃣ Use .loc to filter rows by label range H2

What is .loc? H3

Basic syntax: df.loc[row_labels, column_labe… H3

Select specific rows by label H3

Label range (INCLUSIVE of both ends!) H3

Select specific rows and columns H3

All rows, specific columns H3

3️⃣ Use .iloc to slice rows and columns by p… H2

What is .iloc? H3

Basic syntax: df.iloc[row_positions, column_… H3

Select by integer positions H3

Row and column slicing H3

Specific row and column positions H3

All rows, specific column positions H3

4️⃣ Filter DataFrame with multiple condition… H2

IMPORTANT: Use & for AND, | for OR, and … H3

Single condition (review) H3

Multiple conditions with AND (&) H3

Multiple conditions with OR (|) H3

Complex combination (AND + OR) H3

5️⃣ Use .isin() and .between() for complex f… H2

.isin() — Check if values are in a list H3

Inverse with ~ (NOT) H3

.between() — Range check (inclusive) H3

Combining .isin() and .between() H3

🎯 Complete Hands-on Checklist (Run this) H2

❌ Common Mistakes & Solutions H2

📝 Practice Exercises H2

✅ Summary of Day 2 H2

Quick Stats

Words: 1,094

Est. time: 6 min

Readability: Moderate

Word Cloud

electrocystoscope TAINTLESSLY SUCCESSARY non BOWINGLY fuchsias dhurry choree sledding affairs auletes topsmelts lwp uphasp zoolaters

📌 Day 2: Data Selection & Filtering

Prerequisite Setup (run this first)

import pandas as pd
import numpy as np

# Create sample dataset for all examples
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace', 'Henry'],
    'Age': [25, 30, 35, 28, 22, 40, 33, 29],
    'City': ['NYC', 'LA', 'Chicago', 'NYC', 'LA', 'Chicago', 'NYC', 'LA'],
    'Salary': [50000, 60000, 70000, 55000, 48000, 80000, 65000, 52000],
    'Experience': [2, 5, 8, 3, 1, 12, 6, 4]
}, index=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'])

print("Original DataFrame:")
print(df)

1️⃣ Select single/multiple columns using bracket notation

Single column (returns Series)

# Method 1: df['column_name']
names = df['Name']
print(names)
print(type(names))  # <class 'pandas.core.series.Series'>

Output:

A      Alice
B        Bob
C    Charlie
D      Diana
E        Eve
F      Frank
G      Grace
H      Henry
Name: Name, dtype: object

Multiple columns (returns DataFrame)

# Method: df[['col1', 'col2']] - Note the double brackets!
subset = df[['Name', 'Age', 'Salary']]
print(subset)

Output:

      Name  Age  Salary
A    Alice   25   50000
B      Bob   30   60000
C  Charlie   35   70000
D    Diana   28   55000
E      Eve   22   48000
F    Frank   40   80000
G    Grace   33   65000
H    Henry   29   52000

Why double brackets?

Single bracket [] returns Series
Double brackets [[]] returns DataFrame
df['Name'] vs df[['Name']] — first is Series, second is 1-column DataFrame

Quick tip: Select columns by data type

# Select only numeric columns
numeric_cols = df.select_dtypes(include=['int64', 'float64'])
print(numeric_cols)

2️⃣ Use `.loc` to filter rows by label range

What is `.loc`?

Label-based indexing — uses index labels (row names/numbers) and column names.

Basic syntax: `df.loc[row_labels, column_labels]`

Select specific rows by label

# Single row
print(df.loc['C'])  # Returns Series

# Multiple rows
print(df.loc[['A', 'C', 'E']])  # Returns DataFrame

Label range (INCLUSIVE of both ends!)

# Rows from 'B' to 'E' (includes both B and E)
print(df.loc['B':'E'])

Output:

      Name  Age     City  Salary  Experience
B      Bob   30       LA   60000           5
C  Charlie   35  Chicago   70000           8
D    Diana   28      NYC   55000           3
E      Eve   22       LA   48000           1

Select specific rows and columns

# Rows 'A' to 'D', only 'Name' and 'Salary' columns
print(df.loc['A':'D', ['Name', 'Salary']])

Output:

      Name  Salary
A    Alice   50000
B      Bob   60000
C  Charlie   70000
D    Diana   55000

All rows, specific columns

print(df.loc[:, ['Name', 'City']])  # : means all rows

3️⃣ Use `.iloc` to slice rows and columns by position

What is `.iloc`?

Integer position-based indexing — works like Python list slicing.

Basic syntax: `df.iloc[row_positions, column_positions]`

Select by integer positions

# First 3 rows (positions 0,1,2)
print(df.iloc[0:3])  # Note: end exclusive (like list slicing)

Output (rows 0,1,2):

      Name  Age     City  Salary  Experience
A    Alice   25      NYC   50000           2
B      Bob   30       LA   60000           5
C  Charlie   35  Chicago   70000           8

Row and column slicing

# Rows 2-5 (positions 2 to 4 inclusive), columns 0-2 (Name, Age, City)
print(df.iloc[2:5, 0:3])

Output:

      Name  Age     City
C  Charlie   35  Chicago
D    Diana   28      NYC
E      Eve   22       LA

Specific row and column positions

# Row at position 3, column at position 4
print(df.iloc[3, 4])  # Output: 3 (Experience of Diana)

# Multiple specific rows and columns
print(df.iloc[[0, 2, 5], [0, 3]])  # Rows 0,2,5 and columns 0,3

All rows, specific column positions

# All rows, columns 1 and 3 (Age and Salary)
print(df.iloc[:, [1, 3]])

4️⃣ Filter DataFrame with multiple conditions (&, |)

IMPORTANT: Use `&` for AND, `|` for OR, and wrap each condition in `()`

Single condition (review)

# Age > 30
older = df[df['Age'] > 30]
print(older)

Multiple conditions with AND (`&`)

# Age between 25 AND 35, AND Salary > 55000
filtered = df[(df['Age'] >= 25) & (df['Age'] <= 35) & (df['Salary'] > 55000)]
print(filtered)

Output:

      Name  Age     City  Salary  Experience
B      Bob   30       LA   60000           5
C  Charlie   35  Chicago   70000           8
G    Grace   33      NYC   65000           6

Multiple conditions with OR (`|`)

# Either from NYC OR from LA
nyc_or_la = df[(df['City'] == 'NYC') | (df['City'] == 'LA')]
print(nyc_or_la)

Complex combination (AND + OR)

# (Age > 30 AND Salary > 60000) OR (Experience > 5)
complex_filter = df[((df['Age'] > 30) & (df['Salary'] > 60000)) | (df['Experience'] > 5)]
print(complex_filter)

5️⃣ Use `.isin()` and `.between()` for complex filters

`.isin()` — Check if values are in a list

# Filter cities in a list
cities = ['NYC', 'Chicago']
filtered_cities = df[df['City'].isin(cities)]
print(filtered_cities)

Output:

      Name  Age     City  Salary  Experience
A    Alice   25      NYC   50000           2
C  Charlie   35  Chicago   70000           8
D    Diana   28      NYC   55000           3
G    Grace   33      NYC   65000           6

Inverse with `~` (NOT)

# All cities EXCEPT NYC and LA
not_nyc_la = df[~df['City'].isin(['NYC', 'LA'])]
print(not_nyc_la)  # Only Chicago

`.between()` — Range check (inclusive)

# Age between 25 and 35 (inclusive)
age_filter = df[df['Age'].between(25, 35)]
print(age_filter)

# Salary between 50000 and 65000
salary_filter = df[df['Salary'].between(50000, 65000)]
print(salary_filter[['Name', 'Salary']])

Combining `.isin()` and `.between()`

# Age 25-35 AND City is NYC or LA
result = df[
    df['Age'].between(25, 35) & 
    df['City'].isin(['NYC', 'LA'])
]
print(result)

🎯 Complete Hands-on Checklist (Run this)

import pandas as pd
import numpy as np

# Create dataset
df = pd.DataFrame({
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Desk', 'Chair', 'Lamp'],
    'Category': ['Electronics', 'Accessories', 'Accessories', 'Electronics', 'Furniture', 'Furniture', 'Accessories'],
    'Price': [1200, 25, 75, 300, 450, 200, 35],
    'Stock': [10, 100, 50, 15, 5, 20, 60],
    'Rating': [4.5, 4.2, 4.3, 4.7, 4.1, 4.4, 4.0]
}, index=['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7'])

print("="*50)
print("1. Column Selection")
print("="*50)
print("Single column (Price):")
print(df['Price'])
print("\nMultiple columns (Product, Price):")
print(df[['Product', 'Price']])

print("\n" + "="*50)
print("2. .loc Label-based Filtering")
print("="*50)
print("Rows P2 to P4:")
print(df.loc['P2':'P4'])
print("\nRows P1 to P3, columns Product and Rating:")
print(df.loc['P1':'P3', ['Product', 'Rating']])

print("\n" + "="*50)
print("3. .iloc Position-based Filtering")
print("="*50)
print("First 3 rows:")
print(df.iloc[0:3])
print("\nRows 2-4, columns 0 and 2:")
print(df.iloc[1:4, [0, 2]])

print("\n" + "="*50)
print("4. Multiple Conditions (&, |)")
print("="*50)
print("Price > 100 AND Stock < 30:")
print(df[(df['Price'] > 100) & (df['Stock'] < 30)])

print("\n" + "="*50)
print("5. .isin() and .between()")
print("="*50)
print("Category in [Electronics, Furniture]:")
print(df[df['Category'].isin(['Electronics', 'Furniture'])])
print("\nPrice between 50 and 400:")
print(df[df['Price'].between(50, 400)])

❌ Common Mistakes & Solutions

Mistake	Why it fails	Correct way
`df[df['Age'] > 30 & df['Salary'] > 50000]`	Missing parentheses	`df[(df['Age'] > 30) & (df['Salary'] > 50000)]`
`df.loc[1:3]` when index is not numeric	Label vs position confusion	Use `df.iloc[1:3]` for positions
`df['Age', 'Name']`	Wrong syntax for multiple columns	`df[['Age', 'Name']]`
`df[df['City'] == 'NYC' or 'LA']`	`or` doesn't work with Series	`df[df['City'].isin(['NYC', 'LA'])]`
`df.loc['A':'D', 'Name']`	Works but returns Series	Use `df.loc['A':'D', ['Name']]` for DataFrame

📝 Practice Exercises

From the products DataFrame, select all rows where:
- Price > 100 AND Stock > 10
Use .iloc to get rows 3-5 and columns 1, 3, 4
Filter products where Category is 'Accessories' using .isin()
Use .between() to find products with Rating between 4.2 and 4.6
Combine conditions: (Price < 100 OR Category == 'Electronics') AND Stock > 20

✅ Summary of Day 2

Method	Purpose	Example
`df['col']`	Single column	`df['Age']`
`df[['col1','col2']]`	Multiple columns	`df[['Name','Age']]`
`.loc[]`	Label-based selection	`df.loc['A':'D', 'Name']`
`.iloc[]`	Position-based selection	`df.iloc[0:3, 0:2]`
`&`, `\|`	Multiple conditions	`df[(df['Age']>25) & (df['Salary']<60000)]`
`.isin()`	Match any in list	`df[df['City'].isin(['NYC','LA'])]`
`.between()`	Range filter	`df[df['Age'].between(25,35)]`

Pandas Day 2 — Selecting, Slicing & Filtering Data

📌 Day 2: Data Selection & Filtering

Prerequisite Setup (run this first)

1️⃣ Select single/multiple columns using bracket notation

Single column (returns Series)

Multiple columns (returns DataFrame)

Why double brackets?

Quick tip: Select columns by data type

2️⃣ Use .loc to filter rows by label range

What is .loc?

Basic syntax: df.loc[row_labels, column_labels]

Select specific rows by label

Label range (INCLUSIVE of both ends!)

Select specific rows and columns

All rows, specific columns

3️⃣ Use .iloc to slice rows and columns by position

What is .iloc?

Basic syntax: df.iloc[row_positions, column_positions]

Select by integer positions

Row and column slicing

Specific row and column positions

All rows, specific column positions

4️⃣ Filter DataFrame with multiple conditions (&, |)

IMPORTANT: Use & for AND, | for OR, and wrap each condition in ()

Single condition (review)

Multiple conditions with AND (&)

Multiple conditions with OR (|)

Complex combination (AND + OR)

5️⃣ Use .isin() and .between() for complex filters

.isin() — Check if values are in a list

Inverse with ~ (NOT)

.between() — Range check (inclusive)

Combining .isin() and .between()

🎯 Complete Hands-on Checklist (Run this)

❌ Common Mistakes & Solutions

📝 Practice Exercises

✅ Summary of Day 2

Discussion (0)

Dictionary

Add New Word

Dictionary Words

My Notes

Highlights

My Vocabulary

Quick Quiz

Settings

Reading Analytics

AI Summary

Certificate of Completion

Guest

"Pandas Day 2 — Selecting, Slicing & Filtering Data"

2️⃣ Use `.loc` to filter rows by label range

What is `.loc`?

Basic syntax: `df.loc[row_labels, column_labels]`

3️⃣ Use `.iloc` to slice rows and columns by position

What is `.iloc`?

Basic syntax: `df.iloc[row_positions, column_positions]`

IMPORTANT: Use `&` for AND, `|` for OR, and wrap each condition in `()`

Multiple conditions with AND (`&`)

Multiple conditions with OR (`|`)

5️⃣ Use `.isin()` and `.between()` for complex filters

`.isin()` — Check if values are in a list

Inverse with `~` (NOT)

`.between()` — Range check (inclusive)

Combining `.isin()` and `.between()`