🔎 Find
Search this page
Previous
Next
IN‑V‑BAT‑AI — $1/day
K‑12 Math Exam AI Tutor


Search this page
Tap then use microphone

Python Lesson 4 Collected Knowledge
by Apolinario "Sam" Ortega, founder IN-V-BAT-AI

How to use Python pandas to read data from csv file

⭐ Learn how to import Python library

 
# comment : import the Python library
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

📘 Learn how to read data from a CSV file

 
print('air_quality.csv file is located at directory  /Users/xxxxx/projects/air_quality.csv\n') 

print('Remember the directory path of your filename for example air_quality.csv. Otherwise you will get error message\n')
print('The air quality measured here is the pollutant called Nitrogen Dioxide, NO2\n')
air_quality = pd.read_csv('/Users/xxxxx/projects/air_quality.csv')
air_quality.head()
# comment # do shift + enter

☑ Output of Python code in Jupyter

 
air_quality.csv file is located at directory  /Users/xxxxx/projects/air_quality.csv

Remember the directory path of your filename for example air_quality.csv. Otherwise you will get error message

The air quality measured here is the pollutant called Nitrogen Dioxide, NO2

datetime	station_antwerp	station_paris	station_london
0	5/7/2019 2:00	NaN	NaN	23.0
1	5/7/2019 3:00	50.5	25.0	19.0
2	5/7/2019 4:00	45.0	27.7	19.0
3	5/7/2019 5:00	NaN	50.4	16.0
4	5/7/2019 6:00	NaN	61.9	NaN

📘 Learn how to add an empty blank space

 
print("This is Line 1")
print()   # adds a blank line
print("This is Line 2")
print()   # adds a blank line
print("This with \n to add empty space below Line 1\n")
print("Line 2")
print()   # adds a blank line
print("Line 1\n\n")
print("Line 2")

☑ Output of Python code in Jupyter

This is Line 1

This is Line 2

This with 
 to add empty space below Line 1

Line 2

Line 1


Line 2

📘 Learn how to display the data types information .info()

 
air_quality = pd.read_csv('/Users/xxxxx/projects/air_quality.csv')
air_quality.info()
# comment # do shift + enter

☑ Output of Python code in Jupyter.


DatetimeIndex: 1035 entries, 2019-05-07 02:00:00 to 2019-06-21 02:00:00
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   station_antwerp  95 non-null     float64
 1   station_paris    1004 non-null   float64
 2   station_london   969 non-null    float64
dtypes: float64(3)
memory usage: 32.3 KB

📘 Learn how to display the maximum value in column name station_paris .max()

 
air_quality = pd.read_csv('/Users/xxxxx/projects/air_quality.csv')
air_quality["station_paris"].max()
# comment # do shift + enter

☑ Output of Python code in Jupyter.

97.0

📘 Learn how to display the descriptive statistics .describe()

 
air_quality = pd.read_csv('/Users/xxxxx/projects/air_quality.csv')
air_quality.describe()
# comment # do shift + enter

☑ Output of Python code in Jupyter.

	station_antwerp	station_paris	station_london
count	95.000000	1004.000000	969.000000
mean	25.778947	27.740538	24.777090
std	12.682019	15.285746	11.214377
min	7.500000	0.000000	0.000000
25%	16.750000	16.500000	19.000000
50%	23.000000	24.150000	25.000000
75%	34.500000	35.925000	31.000000
max	74.500000	97.000000	97.000000

📘 Learn how to display the top 5 rows and all the columns using .head()

 
air_quality = pd.read_csv('/Users/xxxxx/projects/air_quality.csv')
air_quality.head()

☑ Output of Python code in Jupyter. Notice the index starts with 0 to 4. How do you remove the index number?

	datetime	station_antwerp	station_paris	station_london
0	5/7/2019 2:00	NaN	NaN	23.0
1	5/7/2019 3:00	50.5	25.0	19.0
2	5/7/2019 4:00	45.0	27.7	19.0
3	5/7/2019 5:00	NaN	50.4	16.0
4	5/7/2019 6:00	NaN	61.9	NaN

📘 Learn how to remove the index number colum that starts with 0

 
# comment : parse_dates = true means convert the date to timestamps object
# comment : index_col=0  removes the index number by forcing to display the index colum=0
print("parse_dates = true means convert the date to timestamps object")
print()
print("index_col=0  removes the index number")
print()
air_quality = pd.read_csv('/Users/xxxxx/projects/air_quality.csv', index_col=0, parse_dates=True)
air_quality.head()
# comment # do shift + enter

☑ Output of Python code in Jupyter

parse_dates = true means convert the date to timestamps object

index_col=0  removes the index number

station_antwerp	station_paris	station_london
datetime			
2019-05-07 02:00:00	NaN	NaN	23.0
2019-05-07 03:00:00	50.5	25.0	19.0
2019-05-07 04:00:00	45.0	27.7	19.0
2019-05-07 05:00:00	NaN	50.4	16.0
2019-05-07 06:00:00	NaN	61.9	NaN

📘 Learn how to plot three line graphs

 
# comment : I want a quick look of the air quality at three station by plotting the line graph.
# comment : line graph is the default of .plot() function
air_quality.plot(figsize=(12, 4))
# comment # do shift + enter

import matplotlib.pyplot as plt
air_quality.plot(figsize=(12, 4))
# Save the plot as a PDF
plt.savefig("air_quality_plot.pdf", format="pdf")
# Optional: show the plot
plt.show()

☑ Output of Python code in Jupyter



📘 Learn how to plot three line graphs and add title and y-axis label

 
from datetime import datetime
import matplotlib.pyplot as plt

# Current timestamp
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# Create a single figure and axis (shared scale)
fig, axs = plt.subplots(figsize=(12, 6))

# Plot all stations together on one scale
air_quality.plot.area(ax=axs)

# Label the y-axis
axs.set_ylabel("NO$_2$ concentration")

# Add a global title ABOVE the figure
plt.suptitle(
    f"Air Quality Stations (NO₂ Concentration)\n"
    f"The air quality measured here is Nitrogen Dioxide (NO₂)\n"
    f"Plot generated on {now}",
    fontsize=16,
    y=1.01   # pushes title above the figure
)

# Adjust layout so title does not overlap
plt.tight_layout()

# Save the figure
fig.savefig("no2_concentration.png")

print()

☑ Output of Python code in Jupyter



📘 Learn how to plot single line graph

 
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# 1. Read your CSV file
air_quality = pd.read_csv('/Users/xxxxx/projects/air_quality.csv')

# 2. Convert the datetime column to actual datetime objects
air_quality['datetime'] = pd.to_datetime(air_quality['datetime'])

# 3. Set datetime as the index
air_quality = air_quality.set_index('datetime')

# 4. Plot Paris station values with datetime on x-axis
air_quality["station_paris"].plot(figsize=(12, 4))

# 5. Format the x-axis for readability
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d %H:%M"))
plt.gca().xaxis.set_major_locator(mdates.AutoDateLocator())

plt.xticks(rotation=45, ha="right")
plt.tight_layout()

# Get current date and time
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
# Plot Paris station air quality
air_quality["station_paris"].plot(figsize=(12, 4))

# Add datetime as a title or subtitle
plt.title(f"Paris Air Quality (NO2) — Plot generated on {now}", fontsize=16)
plt.show()

☑ Output of Python code in Jupyter



📘 Learn how to compare pollution index between London and Paris

 
# comment : I want to compare the pollution level between london and Paris
air_quality.plot.scatter(x="station_london", y="station_paris", alpha=0.5,  figsize=(12, 6))
# Add datetime as a title or subtitle
plt.title(f"Paris Air Quality Vs London Air Qualtiy Plot generated on {now}", fontsize=16)
# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 Learn how to display three air quality station using box plot

 
# comment : I want to show the comparison between the three air quality station using box plot.
air_quality.plot.box(figsize=(12, 6))
plt.title(f"Three Air Quality Station Using Box Plot - Plot generated on {now}", fontsize=16)
# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 Learn how to display air quality station - timestamps aligned

 
from datetime import datetime
import matplotlib.pyplot as plt

# Get current timestamp
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# Create subplots for each station
axs = air_quality.plot.area(figsize=(12, 6), subplots=True)

# Add individual titles for each subplot
for ax, col in zip(axs, air_quality.columns):
    ax.set_title(f"{col} — timestamps aligned", fontsize=14)

# Add a global figure title (optional)
plt.suptitle(f"Air Quality Stations — Plot generated on {now}", fontsize=16)

plt.tight_layout()
plt.show()

☑ Output of Python code in Jupyter



📘 Learn how to display air density using internal KDE algorithm

 
# comment : I want to see a subplot to compare using timestamps aligned for the three air quality station

# KDE density plot for all stations on ONE axis
axs = air_quality.plot.density(figsize=(12, 4), subplots=False)

# Add x-axis label
axs.set_xlabel("NO₂ concentration")   # ⭐ Correct placement

# Add global title
plt.suptitle(f"Air Quality Stations — Plot generated on {now}", fontsize=16)

# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 Learn how to display air density for station_paris

 
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt

# Create figure and axis (your preferred structure)
fig, axs = plt.subplots(figsize=(12, 6))

# Extract Paris station data
data = air_quality["station_paris"].dropna()

# Compute KDE
kde = gaussian_kde(data)

# Custom x-axis range: -50 to 150
xs = np.linspace(-50, 150, 400)

# Evaluate KDE
ys = kde(xs)

# Plot KDE curve on the axis, in ORANGE
axs.plot(xs, ys, color="orange", linewidth=1)

# Axis labels and title
axs.set_xlabel("NO₂ concentration")
axs.set_ylabel("Density estimate")
axs.set_title("KDE Density Curve for Paris Station (x-axis: -50 to 150)")

plt.show()

☑ Output of Python code in Jupyter



📘 Learn how to display air density for station_london

 
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt

# Create figure and axis (your preferred structure)
fig, axs = plt.subplots(figsize=(12, 6))

# Extract Paris station data
data = air_quality["station_london"].dropna()

# Compute KDE
kde = gaussian_kde(data)

# Custom x-axis range: -50 to 150
xs = np.linspace(-50, 150, 400)

# Evaluate KDE
ys = kde(xs)

# Plot KDE curve on the axis, in GREEN
axs.plot(xs, ys, color="green", linewidth=1)

# Axis labels and title
axs.set_xlabel("NO₂ concentration")
axs.set_ylabel("Density estimate")
axs.set_title("KDE Density Curve for London Station (x-axis: -50 to 150)")

plt.show()

☑ Output of Python code in Jupyter



📘 Learn how to display air density for station_antwerp

 
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt

# Create figure and axis (your preferred structure)
fig, axs = plt.subplots(figsize=(12, 6))

# Extract Paris station data
data = air_quality["station_antwerp"].dropna()

# Compute KDE
kde = gaussian_kde(data)

# Custom x-axis range: -50 to 150
xs = np.linspace(-50, 150, 400)

# Evaluate KDE
ys = kde(xs)

# Plot KDE curve on the axis, in BLUE
axs.plot(xs, ys, color="blue", linewidth=1)

# Axis labels and title
axs.set_xlabel("NO₂ concentration")
axs.set_ylabel("Density estimate")
axs.set_title("KDE Density Curve for Antwerp Station (x-axis: -50 to 150)")

plt.show()

☑ Output of Python code in Jupyter



⭐ What does the **400** mean in `np.linspace(-50, 150, 400)`?

In this line:
xs = np.linspace(-50, 150, 400)
`np.linspace()` takes three arguments:
• start value: -50
• end value: 150
• number of points: 400

So the **400** means:

👉 “Generate 400 evenly spaced x-values between −50 and 150.”

⭐ Why 400 is important for KDE

You use these `xs` values to evaluate your KDE:
ys = kde(xs)
The **more points** you use:
• the smoother the KDE curve looks
• the more detailed the shape of the distribution

If you used fewer points, like:
xs = np.linspace(-50, 150, 50)
the curve would look more jagged and less precise.

If you used more points, like 1000, the curve would be ultra‑smooth but slightly slower to compute.

⭐ Summary

In `np.linspace(-50, 150, 400)`:
-50 = start of x-axis
150 = end of x-axis
400 = number of x-values (resolution of the curve)

The **400** controls how smooth and detailed your KDE line appears.

⭐ Understanding `.plot.density()` in Pandas

When you call `.plot.density()` on a DataFrame, pandas does **not** use density values stored in your CSV or table. Instead, it computes a **Kernel Density Estimate (KDE)** internally to create a smooth probability curve.

⭐ What KDE Means

KDE is a statistical method that estimates the probability distribution of your data. It is calculated **on the fly**, using your numeric values, but the density values are **not stored** in your DataFrame.

⭐ What `.plot.density()` Actually Does

When you run:
axs = air_quality.plot.density(figsize=(12, 4), subplots=False)
Pandas performs these internal steps:
• Takes each numeric column
• Drops NaN values
• Computes a Gaussian KDE curve
• Generates smooth x/y values
• Plots the density curve

None of these density values appear in your DataFrame — they are computed temporarily for plotting.

⭐ Why You Might Not See the Density Curve

Several reasons can make the KDE curve appear “invisible”:
• Your NO₂ values may be tightly clustered
• The KDE bandwidth may be too wide or too narrow
• Multiple stations overlap on the same axis
• The y‑axis scale may be very small

Try plotting one station at a time:
air_quality["station_paris"].plot.density(figsize=(12, 4))

⭐ How to Adjust KDE Smoothness

You can control the KDE bandwidth:
air_quality["station_paris"].plot.density(bw_method=0.2, figsize=(12, 4))
Lower values = more jagged Higher values = smoother

⭐ How to See the Actual Density Values

If you want the **actual x/y density numbers**, compute KDE manually:
from scipy.stats import gaussian_kde
import numpy as np

data = air_quality["station_paris"].dropna()
kde = gaussian_kde(data)

xs = np.linspace(data.min(), data.max(), 200)
ys = kde(xs)

plt.plot(xs, ys)
plt.show()

⭐ Summary

`.plot.density()`:
• computes KDE internally
• does not use density values from your table
• generates smooth curves dynamically
• may appear flat depending on your data

This gives you a statistical view of your NO₂ distribution, not a direct plot of your raw values.

⭐ Why an Air‑Pollution Expert Looks at Air Density

Air‑pollution scientists don’t look at “density of pollutants” only — they also look at the density of the air itself. This is because air density controls how pollutants behave, how they spread, how they dilute, and how dangerous they become.

Below are the five major scientific reasons air density matters.

⭐ 1 — Air density controls pollutant dispersion

When air is dense (cold, high pressure), it sinks and traps pollutants near the ground. When air is less dense (warm, low pressure), it rises and allows pollutants to disperse upward. This is why winter mornings often have:
• higher NO₂
• higher PM₂.₅
• more smog

Dense air = poor dispersion
Light air = better dispersion

⭐ 2 — Air density affects chemical reaction rates

Pollutants like NO₂, O₃, SO₂, and VOCs react differently depending on:
• temperature
• pressure
• humidity All three influence air density. Example:
• Low‑density warm air → faster photochemical reactions → more ozone (O₃) • High‑density cold air → slower reactions → pollutants accumulate

⭐ 3 — Air density determines how far pollutants travel

Light, warm air can carry pollutants long distances. Dense, cold air keeps pollutants localized. This matters for:
• wildfire smoke transport
• industrial plume modeling
• urban smog forecasting

⭐ 4 — Air density is required for atmospheric models

Every professional air‑quality model uses air density:
• AERMOD
• CALPUFF
• WRF‑Chem
• CMAQ These models need air density to compute:
• plume rise
• vertical mixing
• turbulence
• inversion layers

Without air density, the model cannot simulate pollution movement correctly.

⭐ 5 — Air density affects human exposure

Dense air holds pollutants closer to breathing height. This increases exposure for:
• children
• elderly
• people with asthma
• people near highways Air‑pollution experts track density to predict health‑risk periods.

⭐ Summary

Air‑pollution experts look at air density because it directly controls:
• how pollutants spread
• how they dilute
• how they chemically transform
• how far they travel
• how much people breathe

Air density is one of the core variables in atmospheric science — as important as temperature, wind speed, and humidity.

⭐ Why Density Plots (KDE) Help Air‑Quality Analysis

A density plot (KDE — Kernel Density Estimate) shows the **shape of the distribution** of pollutant concentrations. Unlike a histogram, KDE gives a **smooth, continuous curve** that reveals patterns an air‑quality expert cannot see from raw time‑series data alone.

⭐ 1 — KDE shows the *typical* pollution levels

Air‑quality data is noisy and fluctuates minute‑to‑minute. A KDE curve reveals the **most common concentration levels** by showing where the curve peaks.
• A tall peak → pollutant often stays around that value
• A wide curve → pollutant varies a lot
• Multiple peaks → different pollution regimes (traffic vs. background)

This helps experts understand what “normal” looks like for NO₂.

⭐ 2 — KDE exposes extreme pollution events

Time‑series plots hide rare spikes. KDE highlights them because the curve stretches toward high values.
• Long right tail → occasional high‑pollution episodes
• Short tail → stable, low‑risk environment

This is crucial for health‑risk assessment.

⭐ 3 — KDE makes it easy to compare multiple stations

When you overlay KDE curves for Paris, London, and Berlin:
• You instantly see which city has higher baseline NO₂
• You see which city has more variability
• You see which city experiences more extreme spikes

This comparison is much harder with raw time‑series plots.

⭐ 4 — KDE smooths out noise and reveals structure

Air‑quality sensors produce jagged, irregular data. KDE removes noise and reveals the **underlying distribution**. This helps experts detect:
• traffic‑related peaks
• nighttime accumulation
• morning dispersion
• seasonal shifts

⭐ 5 — KDE is essential for modeling and forecasting

Air‑quality models (AERMOD, CMAQ, WRF‑Chem) rely on understanding the **probability distribution** of pollutants. KDE provides:
• smooth probability curves
• realistic input distributions
• better uncertainty estimates

This improves forecasting accuracy.

⭐ Summary

Density plots (KDE) help air‑quality experts because they:
• reveal the true distribution of pollutant levels
• highlight typical values and extreme events
• smooth out noise to show underlying structure
• allow easy comparison between stations
• support modeling, forecasting, and health‑risk analysis

KDE is one of the most powerful tools for understanding NO₂ behavior beyond simple time‑series charts.

📘 Learn how to display air frequency in histogram

 
# comment : I want to see a subplot to compare using timestamps aligned for the three air quality station

# Histogram for all stations on ONE axis
axs = air_quality.plot.hist(figsize=(12, 6), subplots=False)

# Add x-axis label
axs.set_xlabel("NO₂ concentration")   # ⭐ Correct x-axis label

# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 Learn how to display individual air frequency in histogram

 
# comment : I want to see a subplot to compare using timestamps aligned for the three air quality station

# Histogram for each station in separate subplots
axs = air_quality.plot.hist(figsize=(12, 6), subplots=True)

# Add x-axis label to EACH subplot
for ax in axs:
    ax.set_xlabel("NO₂ concentration")   # ⭐ Correct x-axis label for subplots

# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 Learn how to display individual frequency in histogram and KDE Overlay

 
# ---------------------------------------
# Histogram + KDE Overlay for NO₂ Concentration
# Using the given air_quality DataFrame
# ---------------------------------------

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

# Extract NO₂ data from the Paris station
data = air_quality["station_paris"].dropna()

# Create figure + axis
fig, ax = plt.subplots(figsize=(12, 6))

# ---------------------------
# 1. Histogram (frequency)
# ---------------------------
ax.hist(
    data,
    bins=30,
    density=True,      # Match KDE scale
    alpha=0.4,
    color="skyblue",
    edgecolor="black"
)

# ---------------------------
# 2. KDE (smooth density)
# ---------------------------
kde = gaussian_kde(data)

# Custom x-axis range for KDE
xs = np.linspace(-50, 150, 400)
ys = kde(xs)

# Plot KDE curve
ax.plot(xs, ys, color="orange", linewidth=2)

# ---------------------------
# Labels + Title
# ---------------------------
ax.set_xlabel("NO₂ concentration")
ax.set_ylabel("Density")
ax.set_title("NO₂ Distribution —Station Paris - Histogram + KDE Overlay")

plt.show()

☑ Output of Python code in Jupyter



📘 Learn how to display individual frequency in histogram and KDE Overlay

 
# ---------------------------------------
# Histogram + KDE Overlay for NO₂ Concentration
# Using the given air_quality DataFrame
# ---------------------------------------

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

# Extract NO₂ data from the London station
data = air_quality["station_london"].dropna()

# Create figure + axis
fig, ax = plt.subplots(figsize=(12, 6))

# ---------------------------
# 1. Histogram (frequency)
# ---------------------------
ax.hist(
    data,
    bins=30,
    density=True,      # Match KDE scale
    alpha=0.4,
    color="skyblue",
    edgecolor="black"
)

# ---------------------------
# 2. KDE (smooth density)
# ---------------------------
kde = gaussian_kde(data)

# Custom x-axis range for KDE
xs = np.linspace(-50, 150, 400)
ys = kde(xs)

# Plot KDE curve
ax.plot(xs, ys, color="green", linewidth=2)

# ---------------------------
# Labels + Title
# ---------------------------
ax.set_xlabel("NO₂ concentration")
ax.set_ylabel("Density")
ax.set_title("NO₂ Distribution —Station London - Histogram + KDE Overlay")

plt.show()

☑ Output of Python code in Jupyter



📘 Learn how to display individual frequency in histogram and KDE Overlay

 
# ---------------------------------------
# Histogram + KDE Overlay for NO₂ Concentration
# Using the given air_quality DataFrame
# ---------------------------------------

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde

# Extract NO₂ data from the Antwerp station
data = air_quality["station_antwerp"].dropna()

# Create figure + axis
fig, ax = plt.subplots(figsize=(12, 6))

# ---------------------------
# 1. Histogram (frequency)
# ---------------------------
ax.hist(
    data,
    bins=30,
    density=True,      # Match KDE scale
    alpha=0.4,
    color="skyblue",
    edgecolor="black"
)

# ---------------------------
# 2. KDE (smooth density)
# ---------------------------
kde = gaussian_kde(data)

# Custom x-axis range for KDE
xs = np.linspace(-50, 150, 400)
ys = kde(xs)

# Plot KDE curve
ax.plot(xs, ys, color="blue", linewidth=2)

# ---------------------------
# Labels + Title
# ---------------------------
ax.set_xlabel("NO₂ concentration")
ax.set_ylabel("Density")
ax.set_title("NO₂ Distribution —Station Antwerp - Histogram + KDE Overlay")

plt.show()

☑ Output of Python code in Jupyter



PREVIOUS NEXT


🔗 Privacy 🔗 Disclaimer

Copyright 2026
Never Forget Again with IN-V-BAT-AI
INVenting Brain Assistant Tools using Artificial Intelligence
(IN-V-BAT-AI)

Since
April 27, 2009