Data Visualization with Matplotlib in Python
Data Visualization with Matplotlib in Python
Introduction to Data Visualization
Data visualization represents data in graphical formats like charts and graphs, making it easier to identify patterns, trends, and correlations. Matplotlib is a powerful Python library for creating 2D visualizations, with Pyplot being its key submodule for simple plotting.
Installation and Setup
Installation Methods:
• Anaconda: Matplotlib comes pre-installed.
• Standard Installation:
python -m pip install -U pip
python -m pip install -U matplotlib
Importing Pyplot:
import matplotlib.pyplot as plt # Common convention
Types of Charts
• Line Chart: Displays data points connected by lines.
• Bar Chart: Represents data with rectangular bars.
• Pie Chart: Shows proportional data as slices of a circle.
• Histogram: Displays frequency distribution.
• Scatter Plot: Shows relationships between two variables.
• Box Plot: Visualizes data distribution through quartiles.
Line Chart
Basic Line Chart:
import matplotlib.pyplot as plt
x = [1, 2, 3]
y = [2, 3.5, 5]
plt.plot(x, y)
plt.show()
Customizing Line Charts:
Labels and Title:
plt.xlabel(‘Overs’)
plt.ylabel(‘Runs Scored’)
plt.title(‘Over wise Runs Scored’)
Line Style:
plt.plot(x, y, ‘r’, linewidth=4, linestyle=’dashed’) # Red, thick, dashed line
Markers:
plt.plot(x, y, marker=’+’, markersize=10, markeredgecolor=’red’)
Bar Chart
Vertical Bar Chart:
categories = [‘1-10′, ’11-20′, ’21-30’]
values = [65, 55, 70]
plt.bar(categories, values, width=0.3, color=[‘r’, ‘g’, ‘b’])
plt.xlabel(‘Over Interval’)
plt.ylabel(‘Runs Scored’)
plt.title(‘Scoring Chart’)
plt.show()
Horizontal Bar Chart:
plt.barh(cities, temperatures)
plt.xlabel(‘Temperature’)
plt.ylabel(‘Cities’)
Multiple Bar Chart:
x = np.linspace(1, 5, 5)
plt.bar(x, team_a, width=0.3, label=’Team A’)
plt.bar(x + 0.3, team_b, width=0.3, label=’Team B’)
plt.legend()
Pie Chart
Basic Pie Chart:
slices = [50, 20, 15, 10]
departments = [‘Sales’, ‘HR’, ‘Finance’, ‘Production’]
plt.pie(slices, labels=departments, autopct=’%1.1f%%’, shadow=True)
plt.title(‘Department Distribution’)
plt.show()
Exploded Pie Chart:
explode = [0, 0.2, 0, 0] # Pull out the ‘HR’ slice
plt.pie(slices, explode=explode, labels=departments)
Histogram
Frequency Distribution:
ages = [22, 32, 35, 45, 55, 14, 26]
bins = [0, 10, 20, 30, 40, 50, 60]
plt.hist(ages, bins=bins, color=’magenta’, edgecolor=’black’)
plt.xlabel(‘Employee Age’)
plt.ylabel(‘Number of Employees’)
Frequency Polygon:
plt.hist(ages, bins=bins, histtype=’step’)
Box Plot
Basic Box Plot:
data = [val1, val2, val3]
plt.boxplot(data, labels=[‘Series1’, ‘Series2’, ‘Series3’])
Customized Box Plot:
plt.boxplot(data, patch_artist=True, notch=True)
Scatter Plot
Basic Scatter Plot:
plt.scatter(x, y, color=’red’, marker=’x’)
plt.xlabel(‘Age’)
plt.ylabel(‘Number of Employees’)
Saving Plots
plt.savefig(‘path/to/file.pdf’) # Supports formats like PNG, PDF, SVG
plt.show()
Key Takeaways
• Use plt.plot() for line charts.
• Customize charts with labels (xlabel, ylabel), titles (title), and legends (legend).
• Bar charts (bar, barh) are ideal for comparisons.
• Pie charts (pie) show proportions.
• Histograms (hist) display distributions.
• Box plots (boxplot) summarize data statistics.
• Scatter plots (scatter) reveal relationships between variables.