🔎 Find
Search this page
Previous
Next
IN‑V‑BAT‑AI — $1/day
K‑12 Math Exam AI Tutor


Search this page
Tap then use microphone

Python Lesson 5 Collected Knowledge
by Apolinario "Sam" Ortega, founder IN-V-BAT-AI

How to use Python pandas to read data from csv file

⭐ Learn how to import Python library

 
import pandas as pd
import matplotlib.pyplot as plt
air_quality = pd.read_csv('/Users/xxxxx/projects/air_quality.csv')
air_quality.head()

📘 Learn how to read data from a CSV file

 
print('air_quality.csv file is located at directory  /Users/xxxxx/projects/air_quality.csv\n') 

print('Remember the directory path of your filename for example air_quality.csv. Otherwise you will get error message\n')
print('The air quality measured here is the pollutant called Nitrogen Dioxide, NO2\n')
air_quality = pd.read_csv('/Users/xxxxx/projects/air_quality.csv')
air_quality.head()
# comment # do shift + enter

☑ Output of Python code in Jupyter

 
air_quality.csv file is located at directory  /Users/xxxxx/projects/air_quality.csv

Remember the directory path of your filename for example air_quality.csv. Otherwise you will get error message

The air quality measured here is the pollutant called Nitrogen Dioxide, NO2

datetime	station_antwerp	station_paris	station_london
0	5/7/2019 2:00	NaN	NaN	23.0
1	5/7/2019 3:00	50.5	25.0	19.0
2	5/7/2019 4:00	45.0	27.7	19.0
3	5/7/2019 5:00	NaN	50.4	16.0
4	5/7/2019 6:00	NaN	61.9	NaN

📘 Learn how to express NO2 in mg/m^3 (milligram per cubic meter) then add this new column

 
# comment : I want to express NO2 in mg/m^3 (milligram per cubic meter) then add this new column
#(If we assume temperature of 25 degrees Celsius and pressure of 1013 hPa, the conversion factor is 1.882)
air_quality["london_mg_per_cubic"] = air_quality["station_london"] * 1.882
air_quality.head()
# comment # do shift + enter

☑ Output of Python code in Jupyter

	datetime	station_antwerp	station_paris	station_london	london_mg_per_cubic
0	5/7/2019 2:00	NaN	NaN	23.0	43.286
1	5/7/2019 3:00	50.5	25.0	19.0	35.758
2	5/7/2019 4:00	45.0	27.7	19.0	35.758
3	5/7/2019 5:00	NaN	50.4	16.0	30.112
4	5/7/2019 6:00	NaN	61.9	NaN	NaN

📘 I want to add new column in air_quality table. "ratio_paris_antwerp" is the new column field name

 
# comment : I want to add new column in air_quality table. "ratio_paris_antwerp" is the new column field name
air_quality["ratio_paris_antwerp"] =  air_quality["station_paris"] / air_quality["station_antwerp"]
air_quality.head()
# comment # do shift + enter

☑ Output of Python code in Jupyter.

datetime	station_antwerp	station_paris	station_london	london_mg_per_cubic	ratio_paris_antwerp
0	5/7/2019 2:00	NaN	NaN	23.0	43.286	NaN
1	5/7/2019 3:00	50.5	25.0	19.0	35.758	0.495050
2	5/7/2019 4:00	45.0	27.7	19.0	35.758	0.615556
3	5/7/2019 5:00	NaN	50.4	16.0	30.112	NaN
4	5/7/2019 6:00	NaN	61.9	NaN	NaN	NaN

📘 I want to renamed the fieldname label

 
# comment : I want to renamed the fieldname label
air_quality_renamed = air_quality.rename( columns={"station_antwerp": "BETR801", "station_paris": "FR04014",
                                                    "station_london": "London Westminster"})
air_quality_renamed.head()
# comment # do shift + enter

☑ Output of Python code in Jupyter.

	datetime	BETR801	FR04014	London Westminster	london_mg_per_cubic	ratio_paris_antwerp
0	5/7/2019 2:00	NaN	NaN	23.0	43.286	NaN
1	5/7/2019 3:00	50.5	25.0	19.0	35.758	0.495050
2	5/7/2019 4:00	45.0	27.7	19.0	35.758	0.615556
3	5/7/2019 5:00	NaN	50.4	16.0	30.112	NaN
4	5/7/2019 6:00	NaN	61.9	NaN	NaN	NaN

📘 I want to show the descriptive statistics

 
air_quality_renamed.describe()

☑ Output of Python code in Jupyter.

	BETR801	FR04014	London Westminster	london_mg_per_cubic	ratio_paris_antwerp
count	95.000000	1004.000000	969.000000	969.000000	95.000000
mean	25.778947	27.740538	24.777090	46.630483	1.304051
std	12.682019	15.285746	11.214377	21.105457	0.643986
min	7.500000	0.000000	0.000000	0.000000	0.301176
25%	16.750000	16.500000	19.000000	35.758000	0.868843
50%	23.000000	24.150000	25.000000	47.050000	1.208889
75%	34.500000	35.925000	31.000000	58.342000	1.655238
max	74.500000	97.000000	97.000000	182.554000	4.100000

📘 I want a quick look of the air quality at three station by plotting the line graph.

 
# comment : I want a quick look of the air quality at three station by plotting the line graph.
# comment : line graph is the default of .plot() function
air_quality.plot(figsize=(12, 6))
# comment # do shift + enter

☑ Output of Python code in Jupyter



⭐ How to Remove the Index Numbers from `air_quality_renamed`

Pandas always prints the index by default. To hide or remove the index, choose one of the methods below depending on your goal.

⭐ 1 — Remove index when *displaying* the DataFrame
Use `.to_string(index=False)`:
print(air_quality_renamed.to_string(index=False))
This prints the table **without** index numbers.

⭐ 2 — Remove index when exporting (CSV, HTML, Excel)
Example for CSV:
air_quality_renamed.to_csv("clean.csv", index=False)
Example for HTML:
air_quality_renamed.to_html("clean.html", index=False)
Example for Excel:
air_quality_renamed.to_excel("clean.xlsx", index=False)
All of these remove the index column.

⭐ 3 — Reset the index (if you want a clean 0..N index)
If your index is messy (timestamps, strings, etc.):
air_quality_renamed = air_quality_renamed.reset_index(drop=True)
This gives you a fresh, clean index.

⭐ Summary

• Use to_string(index=False) to hide index when printing
• Use index=False when exporting
• Use reset_index(drop=True) to rebuild the index

📘 Learn how to remove the index number and print the first 10 rows

 
air_quality_renamed
print(air_quality_renamed.head(10).to_string(index=False))

☑ Output of Python code in Jupyter

    datetime  BETR801  FR04014  London Westminster  london_mg_per_cubic  ratio_paris_antwerp
 5/7/2019 2:00      NaN      NaN                23.0               43.286                  NaN
 5/7/2019 3:00     50.5     25.0                19.0               35.758             0.495050
 5/7/2019 4:00     45.0     27.7                19.0               35.758             0.615556
 5/7/2019 5:00      NaN     50.4                16.0               30.112                  NaN
 5/7/2019 6:00      NaN     61.9                 NaN                  NaN                  NaN
 5/7/2019 7:00      NaN     72.4                26.0               48.932                  NaN
 5/7/2019 8:00      NaN     77.7                32.0               60.224                  NaN
 5/7/2019 9:00      NaN     67.9                32.0               60.224                  NaN
5/7/2019 10:00      NaN     56.0                28.0               52.696                  NaN
5/7/2019 11:00      NaN     34.5                21.0               39.522                  NaN

📘 Convert the index (which contains datetime strings) into actual datetime objects

 
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime

# 1. Convert the index (which contains datetime strings) into actual datetime objects
air_quality_renamed.index = pd.to_datetime(air_quality_renamed.index)

# 2. Plot Paris station values with datetime on x-axis
fig, ax = plt.subplots(figsize=(12, 4))
air_quality_renamed["station_paris"].plot(ax=ax)

# 3. Format the x-axis for readability
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d %H:%M"))
ax.xaxis.set_major_locator(mdates.AutoDateLocator())

plt.xticks(rotation=45, ha="right")
plt.tight_layout()

# 4. Add timestamp title
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
ax.set_title(f"Paris Air Quality (NO₂) — Plot generated on {now}", fontsize=16)

plt.show()

☑ Output of Python code in Jupyter



📘 I want to see a plot.area()

 
# comment : I want to see a plot.area()
axs = air_quality.plot.area(figsize=(12, 6), subplots=True)
# comment # do shift + enter

☑ Output of Python code in Jupyter



⭐ Is `plot.area()` doing internal calculations?

Yes — when you run:
 
axs = air_quality.plot.area(figsize=(12, 6), subplots=True)
pandas performs several **internal operations** before drawing the area plot.

🔥 Here is what happens internally:

• **1 — Pandas aligns all columns by index** Even if timestamps are missing in some stations, pandas aligns them automatically.

• **2 — Pandas handles NaN values** Depending on the backend, it may interpolate, fill gaps, or leave empty regions.

• **3 — Pandas computes stacked cumulative values** Area plots are stacked by default. For each timestamp, pandas internally computes:
 
stacked_y = col1 + col2 + col3 + ...
This is why the colored areas appear layered on top of each other.

• **4 — When `subplots=True`, pandas creates one subplot per column** So `axs` becomes an array of Axes objects.

• **5 — Matplotlib’s `stackplot()` is used under the hood** Pandas wraps Matplotlib and performs the stacking math automatically.

⭐ What `plot.area()` does *not* do:

• It does not smooth data • It does not compute KDE • It does not normalize values • It does not modify your DataFrame

It only prepares the values so Matplotlib can draw the stacked filled regions.

⭐ Summary

`plot.area()` **does internal stacking calculations**, aligns timestamps, and handles NaN values — but it does **not** change your data. It simply computes what is needed to draw the stacked area visualization.

📘 I want to group them together using single scale

 
# comment : I want to group them together using single scale and then save the graph as "no2_concentration.png"
fig, axs = plt.subplots(figsize=(12, 6));
air_quality.plot.area(ax=axs);
axs.set_ylabel("NO$_2$ concentration");
fig.savefig("no2_concentrations.png")
# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 I want to see a subplot to compare using timestamps aligned

 
# comment : I want to see a subplot to compare using timestamps aligned for the three air quality station
axs = air_quality.plot.density(figsize=(12, 6), subplots=False)
# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 I want to see a subplot to compare using timestamps aligned

 
# comment : I want to see a subplot to compare using timestamps aligned for the three air quality station
axs = air_quality.plot.density(figsize=(12, 6), subplots=True)
# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 I want to see a subplot to compare using timestamps aligned

 
# comment : I want to see a subplot to compare using timestamps aligned for the three air quality station
axs = air_quality.plot.hist(figsize=(12, 6), subplots=False)
# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 I want to see a subplot to compare using timestamps aligned

 
# comment : I want to see a subplot to compare using timestamps aligned for the three air quality station
axs = air_quality.plot.hist(figsize=(12, 6), subplots=True)
# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 Learn how to display air density for station_paris

 
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt

# Create figure and axis (your preferred structure)
fig, axs = plt.subplots(figsize=(12, 6))

# Extract Paris station data
data = air_quality["station_paris"].dropna()

# Compute KDE
kde = gaussian_kde(data)

# Custom x-axis range: -50 to 150
xs = np.linspace(-50, 150, 400)

# Evaluate KDE
ys = kde(xs)

# Plot KDE curve on the axis, in ORANGE
axs.plot(xs, ys, color="orange", linewidth=1)

# Axis labels and title
axs.set_xlabel("NO₂ concentration")
axs.set_ylabel("Density estimate")
axs.set_title("KDE Density Curve for Paris Station (x-axis: -50 to 150)")

plt.show()

☑ Output of Python code in Jupyter



📘 Learn how to display air density for station_london

 
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt

# Create figure and axis (your preferred structure)
fig, axs = plt.subplots(figsize=(12, 6))

# Extract Paris station data
data = air_quality["station_london"].dropna()

# Compute KDE
kde = gaussian_kde(data)

# Custom x-axis range: -50 to 150
xs = np.linspace(-50, 150, 400)

# Evaluate KDE
ys = kde(xs)

# Plot KDE curve on the axis, in GREEN
axs.plot(xs, ys, color="green", linewidth=1)

# Axis labels and title
axs.set_xlabel("NO₂ concentration")
axs.set_ylabel("Density estimate")
axs.set_title("KDE Density Curve for London Station (x-axis: -50 to 150)")

plt.show()

☑ Output of Python code in Jupyter



📘 Learn how to display air density for station_antwerp

 
from scipy.stats import gaussian_kde
import numpy as np
import matplotlib.pyplot as plt

# Create figure and axis (your preferred structure)
fig, axs = plt.subplots(figsize=(12, 6))

# Extract Paris station data
data = air_quality["station_antwerp"].dropna()

# Compute KDE
kde = gaussian_kde(data)

# Custom x-axis range: -50 to 150
xs = np.linspace(-50, 150, 400)

# Evaluate KDE
ys = kde(xs)

# Plot KDE curve on the axis, in BLUE
axs.plot(xs, ys, color="blue", linewidth=1)

# Axis labels and title
axs.set_xlabel("NO₂ concentration")
axs.set_ylabel("Density estimate")
axs.set_title("KDE Density Curve for Antwerp Station (x-axis: -50 to 150)")

plt.show()

☑ Output of Python code in Jupyter



⭐ Understanding `.plot.density()` in Pandas

When you call `.plot.density()` on a DataFrame, pandas does **not** use density values stored in your CSV or table. Instead, it computes a **Kernel Density Estimate (KDE)** internally to create a smooth probability curve.

⭐ What KDE Means

KDE is a statistical method that estimates the probability distribution of your data. It is calculated **on the fly**, using your numeric values, but the density values are **not stored** in your DataFrame.

⭐ What `.plot.density()` Actually Does

When you run:
axs = air_quality.plot.density(figsize=(12, 4), subplots=False)
Pandas performs these internal steps:
• Takes each numeric column
• Drops NaN values
• Computes a Gaussian KDE curve
• Generates smooth x/y values
• Plots the density curve

None of these density values appear in your DataFrame — they are computed temporarily for plotting.

⭐ Why You Might Not See the Density Curve

Several reasons can make the KDE curve appear “invisible”:
• Your NO₂ values may be tightly clustered
• The KDE bandwidth may be too wide or too narrow
• Multiple stations overlap on the same axis
• The y‑axis scale may be very small

Try plotting one station at a time:
air_quality["station_paris"].plot.density(figsize=(12, 4))

⭐ How to Adjust KDE Smoothness

You can control the KDE bandwidth:
air_quality["station_paris"].plot.density(bw_method=0.2, figsize=(12, 4))
Lower values = more jagged Higher values = smoother

⭐ How to See the Actual Density Values

If you want the **actual x/y density numbers**, compute KDE manually:
from scipy.stats import gaussian_kde
import numpy as np

data = air_quality["station_paris"].dropna()
kde = gaussian_kde(data)

xs = np.linspace(data.min(), data.max(), 200)
ys = kde(xs)

plt.plot(xs, ys)
plt.show()

⭐ Summary

`.plot.density()`:
• computes KDE internally
• does not use density values from your table
• generates smooth curves dynamically
• may appear flat depending on your data

This gives you a statistical view of your NO₂ distribution, not a direct plot of your raw values.

⭐ Why an Air‑Pollution Expert Looks at Air Density

Air‑pollution scientists don’t look at “density of pollutants” only — they also look at the density of the air itself. This is because air density controls how pollutants behave, how they spread, how they dilute, and how dangerous they become.

Below are the five major scientific reasons air density matters.

⭐ 1 — Air density controls pollutant dispersion

When air is dense (cold, high pressure), it sinks and traps pollutants near the ground. When air is less dense (warm, low pressure), it rises and allows pollutants to disperse upward. This is why winter mornings often have:
• higher NO₂
• higher PM₂.₅
• more smog

Dense air = poor dispersion
Light air = better dispersion

⭐ 2 — Air density affects chemical reaction rates

Pollutants like NO₂, O₃, SO₂, and VOCs react differently depending on:
• temperature
• pressure
• humidity All three influence air density. Example:
• Low‑density warm air → faster photochemical reactions → more ozone (O₃) • High‑density cold air → slower reactions → pollutants accumulate

⭐ 3 — Air density determines how far pollutants travel

Light, warm air can carry pollutants long distances. Dense, cold air keeps pollutants localized. This matters for:
• wildfire smoke transport
• industrial plume modeling
• urban smog forecasting

⭐ 4 — Air density is required for atmospheric models

Every professional air‑quality model uses air density:
• AERMOD
• CALPUFF
• WRF‑Chem
• CMAQ These models need air density to compute:
• plume rise
• vertical mixing
• turbulence
• inversion layers

Without air density, the model cannot simulate pollution movement correctly.

⭐ 5 — Air density affects human exposure

Dense air holds pollutants closer to breathing height. This increases exposure for:
• children
• elderly
• people with asthma
• people near highways Air‑pollution experts track density to predict health‑risk periods.

⭐ Summary

Air‑pollution experts look at air density because it directly controls:
• how pollutants spread
• how they dilute
• how they chemically transform
• how far they travel
• how much people breathe

Air density is one of the core variables in atmospheric science — as important as temperature, wind speed, and humidity.

⭐ Why Density Plots (KDE) Help Air‑Quality Analysis

A density plot (KDE — Kernel Density Estimate) shows the **shape of the distribution** of pollutant concentrations. Unlike a histogram, KDE gives a **smooth, continuous curve** that reveals patterns an air‑quality expert cannot see from raw time‑series data alone.

⭐ 1 — KDE shows the *typical* pollution levels

Air‑quality data is noisy and fluctuates minute‑to‑minute. A KDE curve reveals the **most common concentration levels** by showing where the curve peaks.
• A tall peak → pollutant often stays around that value
• A wide curve → pollutant varies a lot
• Multiple peaks → different pollution regimes (traffic vs. background)

This helps experts understand what “normal” looks like for NO₂.

⭐ 2 — KDE exposes extreme pollution events

Time‑series plots hide rare spikes. KDE highlights them because the curve stretches toward high values.
• Long right tail → occasional high‑pollution episodes
• Short tail → stable, low‑risk environment

This is crucial for health‑risk assessment.

⭐ 3 — KDE makes it easy to compare multiple stations

When you overlay KDE curves for Paris, London, and Berlin:
• You instantly see which city has higher baseline NO₂
• You see which city has more variability
• You see which city experiences more extreme spikes

This comparison is much harder with raw time‑series plots.

⭐ 4 — KDE smooths out noise and reveals structure

Air‑quality sensors produce jagged, irregular data. KDE removes noise and reveals the **underlying distribution**. This helps experts detect:
• traffic‑related peaks
• nighttime accumulation
• morning dispersion
• seasonal shifts

⭐ 5 — KDE is essential for modeling and forecasting

Air‑quality models (AERMOD, CMAQ, WRF‑Chem) rely on understanding the **probability distribution** of pollutants. KDE provides:
• smooth probability curves
• realistic input distributions
• better uncertainty estimates

This improves forecasting accuracy.

⭐ Summary

Density plots (KDE) help air‑quality experts because they:
• reveal the true distribution of pollutant levels
• highlight typical values and extreme events
• smooth out noise to show underlying structure
• allow easy comparison between stations
• support modeling, forecasting, and health‑risk analysis

KDE is one of the most powerful tools for understanding NO₂ behavior beyond simple time‑series charts.

📘 Learn how to display air frequency in histogram

 
# comment : I want to see a subplot to compare using timestamps aligned for the three air quality station

# Histogram for all stations on ONE axis
axs = air_quality.plot.hist(figsize=(12, 6), subplots=False)

# Add x-axis label
axs.set_xlabel("NO₂ concentration")   # ⭐ Correct x-axis label

# comment # do shift + enter

☑ Output of Python code in Jupyter



📘 Learn how to display individual air frequency in histogram

 
# comment : I want to see a subplot to compare using timestamps aligned for the three air quality station

# Histogram for each station in separate subplots
axs = air_quality.plot.hist(figsize=(12, 6), subplots=True)

# Add x-axis label to EACH subplot
for ax in axs:
    ax.set_xlabel("NO₂ concentration")   # ⭐ Correct x-axis label for subplots

# comment # do shift + enter

☑ Output of Python code in Jupyter



PREVIOUS NEXT


🔗 Privacy 🔗 Disclaimer

Copyright 2026
Never Forget Again with IN-V-BAT-AI
INVenting Brain Assistant Tools using Artificial Intelligence
(IN-V-BAT-AI)

Since
April 27, 2009