CBSERanker

Loading

Class Notes: Data Handling Using Pandas – I

Class Notes: Data Handling Using Pandas – I

What is Pandas?

Pandas is a super useful Python library for working with data—like a supercharged Excel for coders. It helps you:

  • Organize data neatly (like tables).
  • Clean messy data (handling missing values, duplicates, etc.).
  • Slice, filter, and analyze data quickly.

Pandas Data Structures

Pandas has two main data containers:

  1. Series: A single column of data with labels (like a list with a name tag).
  2. DataFrame: A full table with rows and columns (like an Excel sheet).

1. Series

  • What? A 1D labeled array (e.g., [10, 15, 18] with index labels [0, 1, 2]).
  • Features:
    • Data can be changed (mutable), but size can’t (immutable).
    • Index labels make data easy to access.

How to Create a Series:

python

Copy

import pandas as pd  
data = [10, 15, 18, 22]  
s = pd.Series(data, index=['a', 'b', 'c', 'd'])  
print(s)  

Output:

Copy

a    10  
b    15  
c    18  
d    22  

Cool Tricks with Series:

  • Math ops: s * 2 (multiplies all values by 2).
  • Filtering: s[s > 2] (shows values > 2).
  • Head/Tail: s.head(3) (first 3 rows) or s.tail(2) (last 2 rows).

2. DataFrame

  • What? A 2D table (rows + columns).
  • Features:
    • Columns can hold different data types (numbers, text, etc.).
    • Size and data can be changed (mutable).

How to Create a DataFrame:

python

Copy

data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}  
df = pd.DataFrame(data)  
print(df)  

Output:

Copy

   Name  Age  
0  Alice  25  
1   Bob  30  

Working with DataFrames:

  • Add a columndf['Salary'] = [5000, 6000]
  • Delete a columndel df['Age'] or df.drop('Age', axis=1)
  • Select data:
    • Single column: df['Name']
    • Multiple columns: df[['Name', 'Salary']]
    • Rows: df.loc[0:2] (by label) or df.iloc[0:2] (by position)

DataFrame Operations

  • Filteringdf[df['Age'] > 25] (people older than 25).
  • Mathdf['Salary'].sum() (total salary).
  • Merge/Join: Combine two DataFrames (like SQL joins).pythonCopydf1.merge(df2, on=’ID’, how=’inner’) # Keeps matching rows only.

Reading/Writing CSV Files

  • Read CSVdata = pd.read_csv('file.csv')
  • Save to CSVdf.to_csv('new_file.csv')

Key Takeaways

  • Series = 1D labeled data (single column).
  • DataFrame = 2D table (rows + columns).
  • Use loc/iloc to access data.
  • Pandas is your best friend for data cleaning, analysis, and quick lookups!

Leave a Reply

Your email address will not be published. Required fields are marked *