Digital Signal Processing Teaching Platform

DSP Teaching Lab — Graduate Level

Covering Hilbert Spaces, Distribution Theory, Z-Transform, MUSIC/ESPRIT, Wavelet Analysis, Wigner-Ville Distribution
Communications · Radar · Imaging · Biomedical — Four Major Engineering Application Areas

Learning Path

Not sure where to start? Choose a path based on your goal:

🎯 Core Required Path (Recommended for All)

2A.1 Fourier Series → 2A.2 CTFT → 2B.1 Discrete Signals → 2B.2 LTI & Convolution → 3A.2 DFT/FFT → 2C Sampling Theorem → 3B Window Functions → 7.1 STFT → 🔬 Lab

📡 Communications Engineering Path

Core → 2B.4 Z-Transform → 5A Decimation/Interp → 4A FIR → 4B IIR → 9C OFDM

🔬 Biomedical Signal Path

Core → 5A Decimation/Interp → 3.2 Welch → 4.1 Hilbert → 5.4 CWT → 6.9 EEG/ECG

⚙️ Vibration/Mechanical Path

Core → 5A Decimation/Interp → 3.2 Welch → 4.1 Hilbert → 4.2 Envelope → 6.10 Vibration

📡 Radar/Array Path

Core → 5B Polyphase → 3.4 MUSIC → 6.7 Radar → 6.8 Array

📐

Rigorous Mathematics

L² Spaces · Distribution Theory · Full Derivations

🔬

Advanced Theory

MUSIC · Wavelets · Wigner-Ville · EMD

⚡

Four Major Applications

Communications OFDM · Radar · Imaging · Biomedical

📊 Real-World Datasets

Don't limit yourself to synthetic sine-wave practice. Below are public datasets you can download to run a complete DSP pipeline on real signals.

🫀 Biomedical Signals

PhysioNet (physionet.org): clinical ECG, EEG, EMG, PPG data
MIT-BIH Arrhythmia DB: classic ECG arrhythmia dataset
Sleep-EDF: EEG sleep staging
Suitable for: m6-1 Hilbert (R-peak detection), m7-1 STFT (EEG bands), m9-9 EEG/ECG

🔊 Audio

ESC-50: 50 classes of environmental sounds (5 sec each)
UrbanSound8K: urban sound classification
LibriSpeech: 1000 hours of speech corpus
Suitable for: m3b-1 windowing, m7-1 STFT, m4-* filter design

⚙ Mechanical Vibration

Case Western Bearing Data: classic bearing fault dataset
NASA IMS Bearing Dataset: run-to-failure bearing data
MFPT Bearing Fault: multiple rotation-speed conditions
Suitable for: m6-2 envelope spectrum, m9-10 vibration analysis, Phase 1 BPFO computation

📡 Communications / Radar

RadioML 2018: IQ samples across many modulation schemes
FMCW Radar Dataset: autonomous-driving radar data
GNU Radio Tutorials: SDR examples
Suitable for: m9-6 OFDM, m9-7 radar, communications receivers

💡 Getting started: The easiest entry point is PhysioNet ECG — the data is clean, has clear features (R-peaks), and lets you practice the full pipeline from m6-1 Hilbert all the way to m9-9.

1.1 Hilbert Space & $L^2$ Theory

The mathematical framework of Fourier analysis — Why can sinusoids form a "basis"?

⚠ Math Prerequisites Recommended

This module (M1 Mathematical Foundations) requires the following background:

Linear algebra: inner products, orthogonality, eigenvalues, vector spaces
Real analysis basics: limits, continuity, Cauchy sequences, series convergence
Complex arithmetic: Euler's formula, complex exponentials, complex conjugates
Basic topology (1.1, 1.3): completeness, density, modes of convergence

If you're not familiar with these: feel free to skip M1 and start directly from M2A.1 Fourier Series. M1 is the rigorous foundation for "why Fourier analysis works," but even without it, you can still correctly use every tool. M1 is for advanced students who want to understand the mathematical essence.

Why does this matter? Because the answers to questions like "Why does the energy computed by FFT equal the time-domain energy?" and "Why can sinusoids serve as a basis?" are all hidden in Hilbert space theory. It is the foundation of the entire Fourier analysis edifice — you don't need to think about it every day, but understanding it will give you deeper confidence in every subsequent tool.

One-line summary: Fourier analysis works because sinusoids form an orthogonal basis in a space called $L^2$ — just like the $x, y, z$ axes in 3D space.

Learning Objectives

Define inner product spaces, norms, and completeness; understand the axiomatic structure of Hilbert spaces
Recognize $L^2[0,T]$ as the natural function space for Fourier analysis
Prove that the complex exponentials $\{e^{jn\omega_0 t}\}$ form an orthonormal basis in $L^2$
Derive Parseval's identity from the inner product, establishing a rigorous foundation for "energy conservation"

The Problem: Questions Behind FFT You Never Asked

Every engineer uses the FFT for spectrum analysis. But have you ever wondered:

Why can sinusoids serve as a "basis"? Who decided that frequency components must be sinusoids rather than some other waveform?
Parseval's theorem says time-domain energy = frequency-domain energy — where does that come from? Is it an approximation or exact?
Is the FFT just an approximation? A superposition of infinitely many sinusoids, truncated to finite terms — is it still "correct"?

The answers lie in Hilbert space theory. Once you understand it, you'll see that the FFT is not merely an algorithm, but the numerical implementation of a profound mathematical theorem.

Historical context: In the late 19th to early 20th century, David Hilbert (1862–1943) discovered that "infinite-dimensional vector spaces" required a rigorous mathematical foundation while studying integral equations. Frigyes Riesz and Ernst Fischer proved the completeness of $L^2$ spaces in 1907 (the Riesz-Fischer theorem), providing the most elegant explanation for the convergence of Fourier series. John von Neumann later axiomatized Hilbert spaces, making them the common language of quantum mechanics and signal processing.

Principles: From 3D Vectors to Function Spaces

Intuition first: In three-dimensional space, any vector $\vec{v}$ can be decomposed into components along the $x, y, z$ directions:

$$\vec{v} = v_x\hat{x} + v_y\hat{y} + v_z\hat{z}$$

Each component is obtained by projection (inner product): $v_x = \vec{v}\cdot\hat{x}$. Fourier analysis does exactly the same thing, except the three-dimensional space is replaced by a "function space," and $\hat{x}, \hat{y}, \hat{z}$ are replaced by $e^{jn\omega_0 t}$.

Concept	3D Vector Space $\mathbb{R}^3$	Function Space $L^2[0,T]$
Elements	Vector $\vec{v}$	Function (signal) $f(t)$
Inner Product	$\vec{u}\cdot\vec{v} = \sum u_i v_i$	$\langle f,g\rangle = \frac{1}{T}\int_0^T f\bar{g}\,dt$
Magnitude	$\|\vec{v}\| = \sqrt{\vec{v}\cdot\vec{v}}$	$\\|f\\| = \sqrt{\langle f,f\rangle}$ (RMS value)
Orthogonal Basis	$\hat{x}, \hat{y}, \hat{z}$	$e^{jn\omega_0 t}$, $n \in \mathbb{Z}$
Projection (Coordinates)	$v_x = \vec{v}\cdot\hat{x}$	$c_n = \langle f, e^{jn\omega_0 t}\rangle$ (Fourier coefficients)
Energy Conservation	$\|\vec{v}\|^2 = v_x^2 + v_y^2 + v_z^2$	$\\|f\\|^2 = \sum\|c_n\|^2$ (Parseval)

Rigorous Definition

Let $V$ be a complex vector space. An inner product $\langle \cdot,\cdot\rangle : V \times V \to \mathbb{C}$ satisfies:

$$\begin{aligned} &\text{(i) Conjugate symmetry:} & \langle f, g \rangle &= \overline{\langle g, f \rangle} \\ &\text{(ii) Linearity (first argument):} & \langle \alpha f + \beta g, h \rangle &= \alpha\langle f,h\rangle + \beta\langle g,h\rangle \\ &\text{(iii) Positive definiteness:} & \langle f, f \rangle &\geq 0, \;\text{and} = 0 \iff f = 0 \end{aligned}$$

The norm induced by the inner product: $\|f\| = \sqrt{\langle f,f\rangle}$.

If the inner product space is complete under this norm (every Cauchy sequence converges to an element within the space), it is called a Hilbert space.

$L^2[0,T]$: The Space of Square-Integrable Functions

The natural habitat of Fourier analysis is the $L^2$ space:

$$L^2[0,T] = \left\{ f: [0,T] \to \mathbb{C} \;\Big|\; \int_0^T |f(t)|^2\,dt < \infty \right\}$$

Inner product defined as: $\displaystyle\langle f, g \rangle = \frac{1}{T}\int_0^T f(t)\,\overline{g(t)}\,dt$

Physical meaning of $L^2$: signals with finite energy. $\|f\|^2 = \langle f,f\rangle = \frac{1}{T}\int_0^T |f(t)|^2\,dt$ is the average power.

Intuition: $L^2$ is like an infinite-dimensional "vector space." Each signal is a "vector" in this space, the inner product measures the "similarity" between two signals, and the norm measures the "magnitude" (energy) of a signal. Fourier analysis is simply performing orthogonal projection in this space.

Orthonormal Basis: $\{e^{jn\omega_0 t}\}_{n\in\mathbb{Z}}$

Let $\phi_n(t) = e^{jn\omega_0 t}$, $\omega_0 = 2\pi/T$. Key theorem: this set of functions forms an orthonormal basis of $L^2[0,T]$.

Show orthogonality proof

Compute $\langle \phi_n, \phi_m \rangle$:

$$\langle \phi_n, \phi_m \rangle = \frac{1}{T}\int_0^T e^{jn\omega_0 t}\,\overline{e^{jm\omega_0 t}}\,dt = \frac{1}{T}\int_0^T e^{j(n-m)\omega_0 t}\,dt$$

Case 1: $n = m$

$$\frac{1}{T}\int_0^T 1\,dt = 1$$

Case 2: $n \neq m$

$$\frac{1}{T}\left[\frac{e^{j(n-m)\omega_0 t}}{j(n-m)\omega_0}\right]_0^T = \frac{1}{T}\cdot\frac{e^{j(n-m)2\pi} - 1}{j(n-m)\omega_0} = \frac{1-1}{j(n-m)\omega_0} = 0$$

Because $e^{j(n-m)2\pi} = 1$ (integer number of full rotations).

Conclusion: $\langle \phi_n, \phi_m \rangle = \delta_{nm}$ (Kronecker delta), orthonormal. $\;\blacksquare$

Therefore, any $f \in L^2[0,T]$ can be expanded as:

$$f(t) = \sum_{n=-\infty}^{\infty} c_n\,\phi_n(t) = \sum_{n=-\infty}^{\infty} c_n\,e^{jn\omega_0 t}$$ $$c_n = \langle f, \phi_n \rangle = \frac{1}{T}\int_0^T f(t)\,e^{-jn\omega_0 t}\,dt$$

$c_n$ is the orthogonal projection coefficient of $f$ onto the basis $\phi_n$ — a perfect analogy to the coordinates of a finite-dimensional vector.

Parseval's Identity and Bessel's Inequality

Parseval's Identity

$$\|f\|^2 = \frac{1}{T}\int_0^T |f(t)|^2\,dt = \sum_{n=-\infty}^{\infty} |c_n|^2$$

Show Parseval's identity derivation

Expand $\|f\|^2 = \langle f, f \rangle$:

$$\langle f, f \rangle = \left\langle \sum_n c_n \phi_n,\; \sum_m c_m \phi_m \right\rangle = \sum_n \sum_m c_n \overline{c_m} \langle \phi_n, \phi_m \rangle$$

Using orthogonality $\langle \phi_n, \phi_m \rangle = \delta_{nm}$:

$$= \sum_n \sum_m c_n \overline{c_m}\,\delta_{nm} = \sum_n c_n \overline{c_n} = \sum_n |c_n|^2 \quad\blacksquare$$

Physical meaning: Total energy (power) computed in the time domain = sum of energies of all frequency components. Energy is conserved under orthogonal decomposition.

Bessel's inequality: If only $N$ finite terms are taken, then $\sum_{|n|\leq N} |c_n|^2 \leq \|f\|^2$. Equality holds as $N\to\infty$ (provided $\{\phi_n\}$ forms a complete basis).

How to Use: An Engineer's Practical Perspective

You don't need to think about Hilbert spaces every day, but they explain the following engineering facts:

Engineering Question	Hilbert Space Explanation
FFT output energy = time-domain energy?	Parseval's identity: orthogonal decomposition preserves energy
Why does FFT use sinusoids instead of square waves?	$\{e^{jn\omega_0 t}\}$ is an orthonormal basis in $L^2$
Is truncation to $N$ terms the "best approximation"?	Orthogonal projection theorem: partial sums are the best $L^2$ approximation
Why does the minimum MSE filter use projection?	Orthogonal projection in Hilbert space = MSE minimization

Interactive: Orthogonality Verification

Compute the real part of $\langle e^{jm\omega_0 t}, e^{jn\omega_0 t}\rangle$. When $m \neq n$, the integral is zero (orthogonal); when $m = n$, it equals 1.

m = 2

n = 5

Chart will display automatically after adjusting parameters above

Pitfalls: "Equality" in $L^2$ Is Not Pointwise Equality

In $L^2$, two functions $f$ and $g$ being "equal" means $\|f - g\| = 0$, i.e., $\int|f-g|^2\,dt = 0$. This allows them to differ on measure-zero sets. For example, $f(t) = 0$ and $g(t) = \begin{cases}1 & t=0 \\ 0 & \text{else}\end{cases}$ are "the same function" in $L^2$. This is why Fourier series can "fail to converge to the correct value" at discontinuities, yet still converge perfectly in the $L^2$ sense.

References: [1] Kreyszig, Introductory Functional Analysis with Applications, Ch.3. [2] Rudin, Real and Complex Analysis, Ch.4. [3] Oppenheim & Willsky, Signals and Systems, Ch.3.

✅ Quick Check

Q1: Why does the energy of the FFT spectrum equal the time-domain energy? Explain in one sentence.

Show answer

Because {e^{jnω₀t}} is an orthonormal basis of L², and Parseval's identity guarantees energy conservation under orthogonal decomposition.

Q2: If two signals have identical FFT spectra, must their time-domain waveforms be the same?

Show answer

In the L² sense, yes (equal almost everywhere). But they may differ at finitely many points (differences on measure-zero sets are ignored in L²).

← Home 1.2 Distribution Theory →

1.2 Generalized Functions & Distribution Theory

Why does the Fourier transform of $\sin(\omega_0 t)$ "exist"?

Why does this matter? Because textbooks say the FT of cos is two deltas, but delta is not a function — what is it exactly? Distribution theory gives the "intuitive operations" of physicists and engineers a rigorous foundation, and is the mathematical bedrock for understanding sampling, impulse response, and related concepts.

Previously... In 1.1 we established the Hilbert space framework — sinusoids are orthogonal bases, Parseval guarantees energy conservation. But some signals (like sin(ωt)) have infinite energy and are not in L². What do we do?

One-line summary: The energy of $\sin(\omega t)$ is infinite, so the classical Fourier Transform cannot handle it. Distribution Theory solves this problem using the $\delta$ function.

Learning Objectives

Understand why the classical FT fails for signals like $\sin$, constants, etc.
Recognize the Dirac delta as a functional, not a function
Rigorously derive $\mathcal{F}\{\delta\}=1$, $\mathcal{F}\{1\}=2\pi\delta(\omega)$, $\mathcal{F}\{\cos\omega_0 t\}$
Confidently use the various properties of $\delta(t)$ in engineering

The Problem: The "Mysterious Delta" in Textbooks

Every signal processing textbook writes:

$$\mathcal{F}\{\cos(\omega_0 t)\} = \pi[\delta(\omega - \omega_0) + \delta(\omega + \omega_0)]$$

But wait — the $\delta$ function has a value of "infinity" at $t=0$ and zero everywhere else? No classical function looks like this. Why do textbooks write it so confidently? Can engineers use it safely? Could it lead to errors in some calculation?

Distribution theory's answer: yes, you can use it safely, because $\delta$ is not a function — it is a functional. Once you understand this distinction, all the "mysterious operations" have a rigorous foundation.

Historical context: Physicist Paul Dirac extensively used the $\delta$ function in his 1930s quantum mechanics research to represent the state of a particle at a specific position. Mathematicians were uneasy — Dirac's operations were illegitimate in classical analysis. From 1944 to 1950, French mathematician Laurent Schwartz developed the Theory of Distributions, placing Dirac's intuitive operations on a rigorous functional analysis foundation. Schwartz received the Fields Medal in 1950 for this work. He himself said: "All I did was translate into mathematical language what physicists already knew."

Principles: From the Limitations of Classical FT to Distributions

Step 1: Where Does the Classical FT Fail?

The CTFT requires $\int_{-\infty}^{\infty}|x(t)|\,dt < \infty$ (absolutely integrable) or at least $\int|x|^2\,dt < \infty$ ($L^2$). But the most fundamental signals in engineering violate this condition:

Signal	$\int\|x(t)\|\,dt$	$\int\|x(t)\|^2\,dt$	Classical FT?
$x(t) = A$ (DC)	$\infty$	$\infty$	Does not exist
$x(t) = \cos(\omega_0 t)$	$\infty$	$\infty$	Does not exist
$x(t) = u(t)$ (unit step)	$\infty$	$\infty$	Does not exist
$x(t) = e^{j\omega_0 t}$	$\infty$	$\infty$	Does not exist

These signals share a common trait: infinite duration, infinite energy. Yet they are the most common signals in engineering!

Step 2: The Key Idea — Delta Is a Functional, Not a Function

Intuition: Imagine a "probe" $\varphi(t)$ — it is very smooth and decays rapidly to zero at infinity. We don't directly ask what the value of $\delta(t)$ is at a certain point (that question is meaningless), but rather: "What is $\delta$'s response to this probe?"

Rigorous Definition of the Dirac Delta

$$\delta[\varphi] = \varphi(0), \quad \forall\, \varphi \in \mathcal{S}$$

"$\delta$ is a machine: you feed in any test function $\varphi$, and it outputs the value of $\varphi$ at zero."

$\delta$ is not a "function" — no classical function $f(t)$ can simultaneously satisfy $\int f(t)\varphi(t)\,dt = \varphi(0)$ and $f(t) = 0$ for $t \neq 0$. It is a pure functional — a mapping that "eats functions and outputs numbers."

Intuitive analogy: Think of distributions as "generalized functions." An ordinary function $f(t)$ can also be viewed as a functional: $f[\varphi] = \int f(t)\varphi(t)\,dt$. But distributions allow more "singular" objects to exist — $\delta$ is the most famous example.

Step 3: Schwartz Space (Intuition Is Enough)

The "probe" $\varphi$ above comes from Schwartz space $\mathcal{S}$ — infinitely differentiable and rapidly decreasing functions. You don't need to memorize the exact definition, just know:

Functions in $\mathcal{S}$ are very "well-behaved": smooth, rapidly decaying, and remain well-behaved no matter how many times you differentiate
The Gaussian $e^{-t^2}$ is a typical member of $\mathcal{S}$
Tempered Distribution $T \in \mathcal{S}'$: a continuous linear functional on $\mathcal{S}$

Basic Properties of the Delta Function

The following properties can all be rigorously derived from the functional definition; use them directly in engineering:

$$\begin{aligned} &\text{Sifting:} &\quad \int_{-\infty}^{\infty} f(t)\,\delta(t-t_0)\,dt &= f(t_0) \\[4pt] &\text{Area:} &\quad \int_{-\infty}^{\infty}\delta(t)\,dt &= 1 \\[4pt] &\text{Scaling:} &\quad \delta(at) &= \frac{1}{|a|}\delta(t) \\[4pt] &\text{Derivative:} &\quad \int f(t)\,\delta'(t)\,dt &= -f'(0) \\[4pt] &\text{Symmetry:} &\quad \delta(-t) &= \delta(t) \end{aligned}$$

Fourier Transform of Distributions: Three-Step Derivation

For $T \in \mathcal{S}'$, define its FT $\hat{T} \in \mathcal{S}'$ as:

$$\hat{T}[\varphi] = T[\hat{\varphi}], \quad \forall\, \varphi \in \mathcal{S}$$

(Transfer the FT work to smooth test functions — their FT always exists)

Derive $\mathcal{F}\{\delta(t)\} = 1$

Let $\hat{\varphi}(\omega) = \int_{-\infty}^{\infty}\varphi(t)\,e^{-j\omega t}\,dt$ be the classical FT of $\varphi$.

By definition:

$$\hat{\delta}[\varphi] = \delta[\hat{\varphi}] = \hat{\varphi}(0) = \int_{-\infty}^{\infty}\varphi(t)\,e^{0}\,dt = \int_{-\infty}^{\infty}\varphi(t)\,dt$$

The constant function $1$ acting as a distribution: $1[\varphi] = \int 1 \cdot \varphi(t)\,dt$

The two are identical, therefore $\hat{\delta} = 1$.

Physical meaning: An infinitely narrow impulse contains all frequencies, each with equal amplitude. $\;\blacksquare$

Derive $\mathcal{F}\{1\} = 2\pi\delta(\omega)$

Using duality. If $\mathcal{F}\{f(t)\} = F(\omega)$, then $\mathcal{F}\{F(t)\} = 2\pi f(-\omega)$.

Since $\mathcal{F}\{\delta(t)\} = 1$, duality gives:

$$\mathcal{F}\{1\} = 2\pi\delta(-\omega) = 2\pi\delta(\omega)$$

(The last step used the symmetry of $\delta$: $\delta(-\omega) = \delta(\omega)$.)

Physical meaning: An eternally constant DC signal contains only the "zero frequency" component. $\;\blacksquare$

Derive $\mathcal{F}\{\cos(\omega_0 t)\}$ — The Ultimate Goal

Step 1: From $\mathcal{F}\{1\} = 2\pi\delta(\omega)$ plus the frequency shift property $\mathcal{F}\{e^{j\omega_0 t}f(t)\} = F(\omega - \omega_0)$:

$$\mathcal{F}\{e^{j\omega_0 t}\} = 2\pi\delta(\omega - \omega_0)$$

Step 2: Using Euler's formula $\cos(\omega_0 t) = \frac{1}{2}(e^{j\omega_0 t} + e^{-j\omega_0 t})$:

$$\mathcal{F}\{\cos(\omega_0 t)\} = \frac{1}{2}\cdot 2\pi\delta(\omega - \omega_0) + \frac{1}{2}\cdot 2\pi\delta(\omega + \omega_0)$$ $$\boxed{\mathcal{F}\{\cos(\omega_0 t)\} = \pi[\delta(\omega - \omega_0) + \delta(\omega + \omega_0)]}$$

Physical meaning: The spectrum of a pure sinusoid consists of two "needles" located exactly at $\pm\omega_0$. This perfectly matches engineering intuition — a pure tone has only one frequency. $\;\blacksquare$

How to Use: Engineer's Practical Guide

Distribution theory guarantees the legitimacy of the following operations — you can use them with confidence:

Operation	Mathematics	Engineering Meaning
Sampling = multiply by impulse train	$x_s(t) = x(t)\sum_n\delta(t-nT_s)$	Mathematical model of ADC
Impulse response definition	$h(t) = T\{\delta(t)\}$	LTI system fully determined by $h(t)$
Discrete spectrum	$\mathcal{F}\{\cos\omega_0 t\} = \pi[\delta(\omega\pm\omega_0)]$	"Spectral lines" on a spectrum analyzer
FT of a constant	$\mathcal{F}\{A\} = 2\pi A\,\delta(\omega)$	DC offset = zero-frequency component

Engineer's quick note: Seeing $\delta$ in the frequency domain means that frequency has energy concentrated in an "infinitely narrow but finite-area" manner. The area of $\delta(\omega - \omega_0)$ is 1 (integral equals 1), representing a pure frequency component.

Applications

Communications systems: Spectrum analysis of carrier modulation $x(t)\cos(\omega_c t)$. The FT of $\cos$ is two deltas, and multiplication corresponds to frequency-domain convolution — this is the mathematical foundation of frequency shifting. In 5G NR OFDM with 15 kHz subcarrier spacing, each subcarrier's spectrum is a $\delta$.
Control systems: The impulse response $h(t)$ is defined via $\delta(t)$. PID controller tuning starts from $h(t)$.
Digital signal processing: The sampling process is modeled as $x(t)\cdot\sum\delta(t-nT_s)$, directly leading to the sampling theorem. The CD's 44.1 kHz sampling rate was derived from this model.

Pitfalls & Limitations

Do not apply nonlinear operations to delta: $\delta(t)^2$, $\sqrt{\delta(t)}$ have no meaning. Distributions only support linear operations (addition, scalar multiplication, differentiation, convolution, FT).
Do not treat the "value" of $\delta$ as a number: $\delta(0) = \infty$ is just a heuristic statement; strictly speaking, $\delta$ has no "value" at any point.
Products of $\delta$ are sometimes undefined: $\delta(t)\cdot\delta(t)$ is undefined in distribution theory. Only "distribution $\times$ smooth function" is meaningful.

References: [1] Strichartz, A Guide to Distribution Theory and Fourier Transforms, CRC Press. [2] Folland, Real Analysis, Ch.8-9. [3] Schwartz, Theorie des Distributions, 1950.

✅ Quick Check

Q1: Why can't we use the classical FT for sin(ωt)?

Show answer

Because ∫|sin(ωt)|dt = ∞, the absolute integrability condition is not met. Distribution theory is needed, yielding π[δ(ω-ω₀)+δ(ω+ω₀)].

Q2: Is δ(t) a function?

Show answer

No. It is a functional (distribution), defined as δ[φ]=φ(0). No classical function can do this.

Interactive: δ Function as a Limit

δ(t) is not a function, but it can be viewed as the limit of "narrower and taller" Gaussians: $\delta_\sigma(t) = \frac{1}{\sigma\sqrt{2\pi}}e^{-t^2/(2\sigma^2)}$. Observe what happens as $\sigma \to 0$: the height approaches infinity, but the area is always exactly 1.

σ = 0.5

← 1.1 Hilbert Space 1.3 Convergence Theory →

1.3 Convergence Theory

What does it really mean for a Fourier series to "converge"?

Why does this matter? Because Fourier series "convergence" has four different meanings, and confusing them leads to engineering errors. Understanding convergence rate = understanding why square waves need large bandwidth and why smooth pulses save bandwidth. The Gibbs phenomenon directly explains DAC ringing and intersymbol interference in digital communications.

Previously... Section 1.2 used distribution theory to solve the 'FT of infinite-energy signals' problem. But another question arises: do the partial sums of a Fourier series truly converge to the original function? In what sense?

One-line summary: Adding more terms to a Fourier series brings it closer to the original function — but "closer" has four different meanings, and in engineering you usually only need to care about energy-sense convergence ($L^2$ convergence).

Learning Objectives

Distinguish between pointwise, uniform, $L^2$, and almost-everywhere convergence modes
State the Dirichlet conditions and Carleson's theorem
Quantify the relationship between convergence rate and signal smoothness
Rigorously analyze the 8.95% overshoot of the Gibbs phenomenon

The Problem: Gibbs Ringing of the Square Wave

You've certainly seen this phenomenon: when reconstructing a square wave with a Fourier series, there is pronounced ringing near the discontinuities. Even with hundreds of terms, the overshoot never disappears — it's approximately 9% of the square wave amplitude.

DAC output: When a digital-to-analog converter reconstructs a square wave signal, real physical ringing occurs
Image processing: "Mosquito noise" near sharp edges in JPEG is the Gibbs effect caused by frequency-domain truncation
Filter design: The impulse response of an ideal lowpass filter = sinc function (infinite length); truncation causes passband ripple

Why doesn't the ringing disappear? What type of convergence issue is this? The answer lies in the details of convergence theory.

Historical context: The Gibbs phenomenon is named after J. Willard Gibbs (discovered in 1899), but English mathematician Henry Wilbraham had already described it as early as 1848. Dirichlet (1829) first gave sufficient conditions for pointwise convergence of Fourier series. The most profound result came from Lennart Carleson (1966): he proved that the Fourier series of $L^2$ functions converge "almost everywhere" — a result so difficult that Carleson received the 2006 Abel Prize for it.

Principles: Four Meanings of "Closeness"

Intuition first: Imagine you took a photo (original function) and approximate it with more and more pixels. "Good approximation" can have different criteria:

Pointwise: Every pixel is correct
Uniform: Even the pixel with the largest error tends to zero
$L^2$: Total error energy tends to zero (individual pixel deviations allowed)
Almost everywhere: Every pixel is correct except for finitely many bad ones

Let $S_N(t) = \sum_{|n|\leq N} c_n e^{jn\omega_0 t}$ be the partial sum. The meaning of convergence to $f(t)$ depends on the convergence mode:

Mode	Definition	Condition	Engineering Meaning
Pointwise	$\forall t$: $S_N(t) \to f(t)$	Dirichlet conditions	Converges at every time point
Uniform	$\sup_t \|S_N(t)-f(t)\|\to 0$	$f$ continuous + absolutely convergent	Strongest; impossible for square waves (Gibbs)
$L^2$ (Mean-Square)	$\int\|S_N-f\|^2 dt \to 0$	$f \in L^2$ (always holds)	Energy-sense convergence; most commonly used
Almost Everywhere (a.e.)	$S_N(t)\to f(t)$ except on measure-zero set	$f\in L^2$ (Carleson 1966)	Converges except at negligible points

Key insight: The Fourier series of a square wave converges perfectly in the $L^2$ sense (total energy error → 0), but does not converge uniformly near discontinuities (Gibbs overshoot persists forever). This is not a contradiction — these are two different measurement criteria.

Convergence Rate and Smoothness

Key theorem: The smoother the signal, the faster the Fourier coefficients decay.

$$\text{If } f \in C^k \text{ ($k$-times continuously differentiable)} \implies |c_n| = O(|n|^{-(k+1)})$$

Signal	Continuity	$c_n$ Decay	10-Term Approx. Error	Convergence Speed
Square wave	Discontinuous	$O(1/n)$	~10%	Slow (Gibbs)
Triangle wave	$C^0$ (continuous but not differentiable)	$O(1/n^2)$	~1%	Moderate
Parabolic wave	$C^1$	$O(1/n^3)$	~0.1%	Fast
$C^\infty$ function	Infinitely differentiable	Super-algebraic decay	$<10^{-10}$	Extremely fast

Engineering insight: This explains why: (1) Square waves in digital communications need very large bandwidth (slow harmonic decay), while smooth raised-cosine pulse shaping requires only finite bandwidth. (2) Adding a smoothing filter at the DAC output can dramatically reduce required bandwidth. (3) Sigma-delta modulator noise shaping exploits this principle.

Rigorous Analysis of the Gibbs Phenomenon

Near the discontinuity of a square wave, the maximum overshoot of $S_N$ approaches:

$$\text{overshoot} = \frac{1}{\pi}\int_0^\pi \frac{\sin u}{u}\,du - \frac{1}{2} \approx 0.0895 \approx 8.95\%$$

Derive the Gibbs overshoot ratio

The partial sum of the square wave $f(t) = \text{sgn}(\sin t)$ can be written as:

$$S_N(t) = \frac{4}{\pi}\sum_{k=0}^{N-1}\frac{\sin((2k+1)t)}{2k+1}$$

Near $t = \pi/(2N+1)$ (the first overshoot point), $S_N$ achieves its maximum. It can be shown that:

$$S_N\!\left(\frac{\pi}{2N+1}\right) = \frac{2}{\pi}\sum_{k=0}^{N-1}\frac{\sin\left(\frac{(2k+1)\pi}{2(2N+1)}\right)}{2k+1} \cdot \frac{2}{2N+1}\cdot(2N+1)$$

As $N\to\infty$, this Riemann sum approaches:

$$\frac{2}{\pi}\int_0^{\pi}\frac{\sin u}{u}\,du = \frac{2}{\pi}\cdot\text{Si}(\pi) \approx \frac{2}{\pi}(1.8519) \approx 1.1790$$

The ideal square wave value is 1, so the overshoot $\approx 17.90\%$ of half-amplitude = $8.95\%$ of full amplitude.

Key point: This $8.95\%$ is independent of $N$ — adding more terms will never make it disappear. Increasing $N$ only moves the overshoot closer to the discontinuity, but the height remains unchanged. $\;\blacksquare$

Practical impact of the Gibbs phenomenon:

DAC output: ~9% physical overshoot when reconstructing square waves, potentially triggering downstream circuit thresholds
FIR filters: Truncating the ideal impulse response → passband ripple; this is why window functions (Hann, Kaiser, etc.) are needed
Solutions: Use Lanczos sigma factors, Fejer summation (Cesàro averaging), or directly apply window functions for smooth truncation

How to Use: Convergence Rate Guides Engineering Design

Smoother signal → faster coefficient decay → less bandwidth needed. Practical applications of this principle:

Scenario	Application	Specific Parameters
Pulse shaping	Raised-cosine roll-off factor $\alpha$ controls smoothness	$\alpha=0.25$: bandwidth = $1.25/T_s$
DAC reconstruction	Higher-order interpolation reduces aliasing energy	Linear interp.: $-12$ dB/oct; cubic: $-24$ dB/oct
FIR design	Window functions are equivalent to smooth truncation	Kaiser $\beta=6$: sidelobes $-46$ dB

Interactive: Convergence Rate Comparison

Compare the Fourier series convergence speeds of square wave ($1/n$), triangle wave ($1/n^2$), and parabolic wave ($1/n^3$). Drag the slider to observe the Gibbs phenomenon.

N = 10

Chart will display automatically after adjusting parameters above

Pitfalls & Common Misconceptions

"Adding more terms will eliminate Gibbs ringing" — Wrong. The overshoot percentage is fixed at 8.95%, regardless of $N$.
"$L^2$ convergence is sufficient" — Usually yes, but if you care about signal peak values (e.g., power amplifier headroom design), $L^2$ convergence does not guarantee peak control.
"Carleson's theorem says almost-everywhere convergence" — But at discontinuities, the series converges to the average of the left and right limits $\frac{f(t^+)+f(t^-)}{2}$, not either side's value.

References: [1] Carleson, On convergence and growth of partial sums of Fourier series, Acta Math., 1966. [2] Korner, Fourier Analysis, Cambridge. [3] Gibbs, Fourier's Series, Nature, 1899.

✅ Quick Check

Q1: Why does the Fourier series of a triangle wave converge faster than that of a square wave?

Show answer

The triangle wave is continuous (C⁰), so coefficients decay as 1/n²; the square wave is discontinuous, with coefficients decaying only as 1/n. Smoother → faster convergence.

Q2: Does the Gibbs overshoot disappear as N→∞?

Show answer

The 'width' of the overshoot approaches zero, but the 'height percentage' is always ~9% and never disappears.

← 1.2 Distribution Theory 1.4 Uncertainty Principle →

1.4 Uncertainty Principle

Complete proof and engineering applications of the Heisenberg-Gabor inequality

Why does this matter? Because the uncertainty principle determines the limits of everything you can achieve — how to choose STFT window length, why radar range and velocity resolution cannot both be optimal simultaneously, and optimization of communications pulse shaping. It is not a "theoretical limitation" but an engineering constraint you face in your daily work.

Previously... Section 1.3 showed that Fourier series indeed converge (in the L² sense), with convergence rate depending on signal smoothness. But even with convergence, the 'concentration' in time and frequency domains has an unbreakable lower bound —

One-line summary: You cannot simultaneously know precisely "when a signal occurs" and "what frequencies it contains" — this is a mathematical theorem, not an instrument limitation.

Learning Objectives

Define the rigorous mathematical meaning of time-domain spread $\Delta t$ and frequency-domain spread $\Delta\omega$
Derive $\Delta t\cdot\Delta\omega \geq \frac{1}{2}$ completely from the Cauchy-Schwarz inequality
Prove that the Gaussian achieves equality (minimum uncertainty)
Connect to STFT resolution limits and the radar ambiguity function

The Problem: Why Is STFT Window Length So Hard to Choose?

Nearly every engineer doing time-frequency analysis has encountered this dilemma:

STFT window length: Too short → poor frequency resolution (can't separate two close frequencies); too long → poor time resolution (can't separate two close events). No choice seems right.
Radar waveform design: Range resolution requires short pulses (large bandwidth), velocity resolution requires long pulses (narrow bandwidth). Why can't both be optimal?
Communications systems: Narrower OFDM subcarrier spacing → higher spectral efficiency, but longer symbol duration → more sensitive to time-varying channels.

The answer is not that your design isn't good enough — mathematics itself forbids simultaneous precision. This is the uncertainty principle.

Historical context: Dennis Gabor (1900–1979, 1971 Nobel Prize in Physics laureate, inventor of holography) introduced Werner Heisenberg's quantum mechanical uncertainty principle into signal theory in his seminal 1946 paper Theory of Communication. Gabor pointed out: if signals are viewed as "information quanta (logons)" on the time-frequency plane, each quantum occupies at least $\Delta t \cdot \Delta f \geq \frac{1}{4\pi}$ of area. This result has exactly the same mathematical structure as quantum mechanics' $\Delta x \cdot \Delta p \geq \hbar/2$ — the difference lies only in the physical interpretation.

Principles: Rigorous Statement

Intuition first: Imagine a guitar string being plucked. If you play only an extremely short note (click), you know the precise time, but "what note it is" is vague. If you sustain a steady note (drone), the frequency is clear, but "when it started" is ambiguous. You cannot make both infinitely precise simultaneously.

Define the "spread" of the time and frequency domains as root-mean-square widths (second moments):

$$\Delta t^2 = \frac{\int_{-\infty}^{\infty} t^2|f(t)|^2\,dt}{\int_{-\infty}^{\infty}|f(t)|^2\,dt}, \quad \Delta\omega^2 = \frac{\int_{-\infty}^{\infty} \omega^2|F(\omega)|^2\,d\omega}{\int_{-\infty}^{\infty}|F(\omega)|^2\,d\omega}$$

Heisenberg-Gabor Inequality

$$\boxed{\Delta t \cdot \Delta\omega \geq \frac{1}{2}}$$

Equality holds if and only if $f(t) = Ce^{-\alpha t^2}$ (Gaussian)

Complete Proof

Show full derivation (using Cauchy-Schwarz)

Step 1: Without loss of generality, assume $\|f\| = 1$ (normalized), and the centroid of $f$ is at the origin (otherwise translate).

Step 2: Using the differentiation property $\mathcal{F}\{tf(t)\} = j\frac{d}{d\omega}F(\omega)$, and $\mathcal{F}\{f'(t)\} = j\omega F(\omega)$. By Parseval:

$$\Delta t^2 = \int t^2|f(t)|^2\,dt, \quad \Delta\omega^2 = \int \omega^2|F(\omega)|^2\,d\omega = \frac{1}{2\pi}\int|f'(t)|^2\,dt$$

Step 3: Apply the Cauchy-Schwarz Inequality to $tf(t)$ and $f'(t)$:

$$\left|\int tf(t)\overline{f'(t)}\,dt\right|^2 \leq \int t^2|f|^2\,dt \cdot \int |f'|^2\,dt = \Delta t^2 \cdot 2\pi\Delta\omega^2$$

Step 4: Compute the left side. Note that $f\overline{f'} = \frac{1}{2}\frac{d}{dt}|f|^2$ (when $f$ is real-valued or taking the real part), using integration by parts:

$$\text{Re}\int tf(t)\overline{f'(t)}\,dt = \int t \cdot \frac{1}{2}\frac{d}{dt}|f(t)|^2\,dt$$ $$= \left[\frac{t}{2}|f|^2\right]_{-\infty}^{\infty} - \frac{1}{2}\int|f(t)|^2\,dt = 0 - \frac{1}{2}\|f\|^2 = -\frac{1}{2}$$

(The boundary terms vanish because $f \in L^2$ implies $t|f(t)|^2 \to 0$ as $|t|\to\infty$.)

Step 5: Therefore $\left|\int tf\overline{f'}\,dt\right| \geq \frac{1}{2}$. Substituting into Cauchy-Schwarz:

$$\frac{1}{4} \leq \Delta t^2 \cdot 2\pi\Delta\omega^2$$

Using $\Delta\omega$ (angular frequency): $\Delta t \cdot \Delta\omega \geq \frac{1}{2}$.

Using $\Delta f$ (Hz): $\Delta t \cdot \Delta f \geq \frac{1}{4\pi}$. $\;\blacksquare$

Equality condition: Cauchy-Schwarz equality $\iff$ $f'(t) = -2\alpha t f(t)$ $\iff$ $f(t) = Ce^{-\alpha t^2}$. The Gaussian is the only waveform that achieves minimum uncertainty.

How to Use: Three Major Engineering Constraints

1. STFT Window Length Selection

Window length $T_w$ determines the time-frequency resolution:

$$\Delta t \approx T_w, \quad \Delta f \approx \frac{1}{T_w}$$ $$\Delta t \cdot \Delta f \approx 1 \quad (\text{constrained by the uncertainty principle})$$

Analysis Goal	Window Length Recommendation	Typical Values
Speech formant tracking	Short window (time resolution priority)	$T_w = 20{-}30$ ms
Music pitch detection	Long window (frequency resolution priority)	$T_w = 50{-}100$ ms
Vibration monitoring	Adjust according to frequency range	$\Delta f < 1$ Hz → $T_w > 1$ s

2. Radar Waveform Design

Range resolution $\Delta R = c/(2B)$ ($B$ = bandwidth), velocity resolution $\Delta v = \lambda/(2T)$ ($T$ = pulse duration). The uncertainty principle requires $BT \geq 1$, therefore:

$$\Delta R \cdot \Delta v \geq \frac{c\lambda}{4BT} \geq \frac{c\lambda}{4} \cdot \frac{1}{BT}$$

Solution: Use chirp pulses, maintaining both large $B$ and large $T$ simultaneously, so that $BT \gg 1$.

3. Communications Pulse Shaping

OFDM symbol length $T_{sym}$ and subcarrier spacing $\Delta f_{sc}$ satisfy $T_{sym} \cdot \Delta f_{sc} = 1$ (exactly at equality). 5G NR numerology (15/30/60/120 kHz) switches between different time-frequency tradeoffs.

Applications

5G NR OFDM numerology: Subcarrier spacing 15 kHz → symbol length 66.7 us (low-speed mobility); 120 kHz → 8.33 us (mmWave high-speed scenarios). The tradeoff is determined by the uncertainty principle.
X-band radar (FMCW): Bandwidth $B = 150$ MHz → $\Delta R = 1$ m. Pulse repetition interval $T_{PRI} = 100$ us → $\Delta v \approx 8$ m/s. $BT = 15000 \gg 1$ (chirp compression).
Electroencephalography (EEG) time-frequency analysis: Analyzing alpha waves (8-13 Hz) requires $\Delta f < 5$ Hz → window length $> 200$ ms, limiting the ability to detect transient events.

Interactive: $\Delta t \cdot \Delta f$ Product of the Gaussian

Adjust the $\alpha$ parameter of the Gaussian $e^{-\alpha t^2}$. Observe: narrower in time → wider in frequency, but the product remains constant at $\frac{1}{4\pi}$ (equality).

$\alpha$ = 2.0

Chart will display automatically after adjusting parameters above

Pitfalls & Limitations

The uncertainty principle is a lower bound, not an equality: Only the Gaussian achieves equality. The $\Delta t \cdot \Delta f$ of a rectangular window is much larger than $1/(4\pi)$.
Do not confuse with SNR: The uncertainty principle limits spread, not detection capability. At high SNR, you can still estimate frequency precisely (e.g., phase-locked loops).
"Super-resolution" does not violate the uncertainty principle: Methods like MUSIC and ESPRIT exploit structural assumptions about the signal (e.g., sinusoidal models), bypassing the uncertainty principle's limitations. But the price is: if the assumption is wrong, the method collapses.

References: [1] Folland & Sitaram, The uncertainty principle: a mathematical survey, J. Fourier Anal. Appl., 1997. [2] Gabor, Theory of Communication, J. IEE, 1946. [3] Grochenig, Foundations of Time-Frequency Analysis, Birkhauser.

✅ Quick Check

Q1: If you double the STFT window length, how do the frequency and time resolutions change?

Show answer

Frequency resolution Δf is halved (improves), time resolution Δt is doubled (worsens). The product Δt·Δf remains unchanged.

Q2: What signal achieves the lower bound of the uncertainty principle?

Show answer

The Gaussian pulse e^{-αt²}; its FT is also a Gaussian. It is the only signal that achieves equality at Δt·Δf = 1/(4π).

← 1.3 Convergence Theory 2.1 Fourier Series →

2.1 Fourier Series

Frequency-domain representation of periodic signals — Complete theory

Why does this matter? Because the Fourier series is where everything begins. Power engineers analyzing THD, audio engineers analyzing timbre, communications engineers analyzing modulation — all start from "decomposing signals into harmonics." Without understanding Fourier series, every subsequent tool is a black box.

Previously... Part I established the mathematical foundation. Now we start building — beginning with the most fundamental Fourier series, analyzing what frequency components a periodic signal contains.

One-line summary: Any periodic signal can be decomposed into a superposition of sinusoids — these sinusoids have frequencies that are integer multiples of the fundamental frequency (harmonics).

Learning Objectives

Derive coefficient formulas for the trigonometric and complex exponential forms
Use symmetry (odd/even/half-wave) to simplify coefficient calculations
Compute Fourier coefficients for square, triangle, and sawtooth waves
Understand THD (Total Harmonic Distortion) analysis

The Problem: Harmonic Issues in Power Systems

The 60 Hz AC power from the utility grid is ideally a pure sinusoid. But in real systems:

Nonlinear loads (rectifiers, variable-frequency drives, LED drivers) inject harmonic currents
The 3rd harmonic (180 Hz) does not cancel in three-phase systems, accumulating in the neutral wire → neutral wire overheating
IEEE 519-2022 standard specifies THD must not exceed 5% (voltage) or 8% (current)

To analyze these harmonics, you need the Fourier series. This is not abstract math — it's a practical problem power engineers face every day.

Historical context: Joseph Fourier (1768–1830) submitted a paper on heat conduction to the French Academy of Sciences in 1807, claiming that "any function can be expanded as a series of sinusoidal functions." Reviewer Lagrange strongly objected, believing discontinuous functions could not have such an expansion. The paper was rejected. But Fourier was right (at least in the $L^2$ sense). This controversy spawned the most profound advances in 19th-century analysis — from the Riemann integral to the Lebesgue integral, from pointwise convergence to $L^2$ convergence. Fourier eventually published Theorie analytique de la chaleur in 1822.

Principles: Two Forms

Intuition: Think of a periodic signal as a musical chord. A chord consists of a fundamental (fundamental frequency $f_0$) plus overtones ($2f_0, 3f_0, \ldots$). The Fourier series tells you how strong each overtone is and what its phase is.

Trigonometric Form:

$$f(t) = \frac{a_0}{2} + \sum_{n=1}^{\infty}\left[a_n\cos(n\omega_0 t) + b_n\sin(n\omega_0 t)\right]$$ $$a_n = \frac{2}{T}\int_0^T f(t)\cos(n\omega_0 t)\,dt, \quad b_n = \frac{2}{T}\int_0^T f(t)\sin(n\omega_0 t)\,dt$$

Complex Exponential Form:

$$f(t) = \sum_{n=-\infty}^{\infty}c_n\,e^{jn\omega_0 t}, \quad c_n = \frac{1}{T}\int_0^T f(t)\,e^{-jn\omega_0 t}\,dt$$

Relationships: $c_0 = a_0/2$, $c_n = (a_n - jb_n)/2$, $c_{-n} = \overline{c_n}$ (real signals)

Symmetry Simplification

Using signal symmetry can greatly simplify coefficient calculations — you can even know certain coefficients are zero without integrating:

Signal Symmetry	Result	Series Contains Only	Examples
Even function $f(-t)=f(t)$	$b_n=0$	Cosine only	Triangle wave, rectified sine
Odd function $f(-t)=-f(t)$	$a_n=0$	Sine only	Square wave (odd-symmetric), sawtooth wave
Half-wave symmetry $f(t+T/2)=-f(t)$	Even harmonics are zero	Odd harmonics only	Square wave ($1,3,5,\ldots$ harmonics)

Engineering quick reference: A square wave is both an odd function + half-wave symmetric → contains only odd-order sine harmonics. In power systems, a full-wave rectified waveform is an even function + no half-wave symmetry → contains all even-order cosine harmonics.

Classic Waveform Coefficient Derivations

Square Wave Coefficient Derivation

Square wave with period $T$ and amplitude $\pm 1$: $f(t) = 1$ for $0 < t < T/2$, $f(t) = -1$ for $T/2 < t < T$.

$$c_n = \frac{1}{T}\left[\int_0^{T/2}e^{-jn\omega_0 t}\,dt - \int_{T/2}^{T}e^{-jn\omega_0 t}\,dt\right]$$ $$= \frac{1}{T}\cdot\frac{1}{-jn\omega_0}\left[(e^{-jn\pi}-1) - (e^{-jn2\pi}-e^{-jn\pi})\right]$$ $$= \frac{1}{-jn\omega_0 T}\left[2e^{-jn\pi}-1-1\right] = \frac{2((-1)^n-1)}{-jn\cdot 2\pi}$$

$n$ even → $c_n = 0$. $n$ odd → $c_n = \frac{2}{jn\pi}$.

Converting back to sine form: $b_n = 4/(n\pi)$ (odd $n$), $a_n = 0$.

$$\boxed{f(t) = \frac{4}{\pi}\sum_{n=1,3,5,\ldots}\frac{1}{n}\sin(n\omega_0 t)} \quad\blacksquare$$

Triangle Wave Coefficient Derivation

The triangle wave is the integral of the square wave. Using the differentiation property: if $f(t) = \text{square wave}$, $g(t) = \text{triangle wave} = \int f$, then the Fourier coefficients of $g$ are $d_n = c_n/(jn\omega_0)$.

$$d_n = \frac{4/(jn\pi)}{jn\omega_0} = \frac{-4}{n^2\pi\omega_0} \quad (\text{odd } n)$$

Converting to standard form (triangle wave with amplitude 1):

$$\boxed{f(t) = \frac{8}{\pi^2}\sum_{n=1,3,5,\ldots}\frac{(-1)^{(n-1)/2}}{n^2}\sin(n\omega_0 t)} \quad\blacksquare$$

Note that coefficients decay as $1/n^2$ (faster than the square wave's $1/n$), because the triangle wave is continuous.

How to Use: THD Analysis Steps

Total Harmonic Distortion (THD) is the core metric for measuring waveform distortion:

$$\text{THD} = \frac{\sqrt{\sum_{n=2}^{\infty}|c_n|^2}}{|c_1|} \times 100\%$$

Practical steps:

Use an ADC to sample one complete period (or average over multiple periods)
Perform FFT to obtain the amplitude of each harmonic $|c_n|$
Fundamental = $|c_1|$, harmonics = $|c_2|, |c_3|, \ldots$ (typically up to the 40th is sufficient)
Substitute into the THD formula. IEEE 519 requires voltage THD $\leq 5\%$, individual harmonics $\leq 3\%$

Applications

Power quality monitoring: In the 60 Hz system, a typical 6-pulse rectifier generates 5th, 7th, 11th, and 13th harmonics ($6k \pm 1$ pattern). The 3rd harmonic (180 Hz) accumulates in the three-phase neutral wire, potentially causing neutral current to reach $\sqrt{3}$ times the phase current.
Timbre analysis: The spectrum of a piano A4 (440 Hz) contains a strong fundamental and gradually decaying harmonics; a violin at the same pitch has an entirely different harmonic structure — this is why the two instruments sound different.
RF amplifier linearity: Nonlinearity in power amplifiers generates harmonics and intermodulation distortion (IMD). Second-order harmonics ($2f$) and third-order intermodulation ($2f_1 - f_2$) are the most critical design metrics.

Interactive: Harmonic Synthesis

Select a waveform and number of terms, and observe how the Fourier series progressively approximates the original waveform. Note the Gibbs ringing of the square wave.

N = 5

Chart will display automatically after adjusting parameters above

Pitfalls & Common Misconceptions

"Low THD means the waveform is clean" — Not necessarily. THD is an RMS metric and may mask a particularly strong high-order harmonic. You need to also examine individual harmonic amplitudes.
"Fourier series only applies to periodic signals" — Strictly speaking, yes. But in practice, as long as your time window captures an integer number of periods, the DFT result is equivalent to the Fourier series. Non-integer periods → leakage problems.
"The trigonometric form is more intuitive" — For beginners, yes, but the complex form is more convenient for derivations and computations. It's advisable to be familiar with converting between both forms.

References: [1] Oppenheim & Willsky, Signals and Systems, Ch.3. [2] Stein & Shakarchi, Fourier Analysis, Princeton. [3] IEEE 519-2022, Standard for Harmonic Control.

✅ Quick Check

Q1: Why does a square wave have only odd-order harmonics, and what symmetry is this related to?

Show answer

Half-wave symmetry f(t+T/2)=-f(t). Signals satisfying this symmetry contain only odd-order harmonics.

Q2: What is the frequency of the 3rd harmonic in a 60 Hz power system?

Show answer

60×3 = 180 Hz.

← 1.4 Uncertainty Principle 2.2 CTFT →

2.2 Continuous-Time Fourier Transform (CTFT)

From Fourier series to aperiodic signals — letting $T\to\infty$

Why does this matter? Because the CTFT extends Fourier analysis from periodic signals to all signals. "Filtering = frequency-domain multiplication" — this core theorem (convolution theorem) that makes the entire DSP field possible — is a property of the CTFT.

Previously... The Fourier series from 2.1 can only handle periodic signals. But in reality, most signals are not periodic (speech, transients, random signals). How do we generalize?

One-line summary: Generalizing the Fourier series to aperiodic signals — any signal with finite energy can be decomposed into continuous frequency components.

Learning Objectives

Derive the CTFT from FS ($T\to\infty$ limiting process)
Prove core properties: time shift, convolution, Parseval, etc.
Derive rect $\leftrightarrow$ sinc and Gaussian $\leftrightarrow$ Gaussian transform pairs
Understand the engineering significance of each property

The Problem: Real Signals Are Not Periodic

The Fourier series can only handle periodic signals, but in reality almost no signals are strictly periodic:

Speech, music: transient signals with a beginning and an end
Radar echoes: a pulse that comes and goes
Seismic waves, brain waves: aperiodic, non-stationary

We need a more general tool that can handle any finite-energy signal. The CTFT is that tool.

Historical context: The concept of CTFT originates from an elegant limiting process: letting the period of the Fourier series $T \to \infty$. As $T$ increases, the spacing between discrete harmonic frequencies $n\omega_0 = 2\pi n/T$ shrinks ($\omega_0 \to 0$), and the discrete summation becomes a continuous integral. Fourier himself proposed this idea in 1822. Rigorous mathematical foundations were established by Plancherel (1910) and Wiener (1933).

Principles: From FS to CTFT

Intuition first: The Fourier series tells you "how strong the $n$-th harmonic is" (discrete frequencies). As the period → infinity, the spacing between harmonics → zero, and you no longer have discrete "frequency indices" but rather a continuous spectral density function $F(\omega)$.

Show full derivation: The $T \to \infty$ Limit

Periodic signal $f_T(t) = \sum_n c_n e^{jn\omega_0 t}$, $c_n = \frac{1}{T}\int_{-T/2}^{T/2}f(t)e^{-jn\omega_0 t}dt$.

Define $F_T(\omega) = Tc_n\big|_{\omega=n\omega_0} = \int_{-T/2}^{T/2}f(t)e^{-j\omega t}dt$ (moving the $1/T$ factor inside $F_T$).

Then the original expansion becomes:

$$f_T(t) = \frac{1}{T}\sum_n F_T(n\omega_0)e^{jn\omega_0 t} = \frac{1}{2\pi}\sum_n F_T(n\omega_0)\,\underbrace{\omega_0}_{\Delta\omega}\, e^{jn\omega_0 t}$$

Let $T\to\infty$: $\omega_0 = 2\pi/T \to d\omega$, $n\omega_0 \to \omega$ (continuous), discrete sum → Riemann integral:

$$\boxed{f(t) = \frac{1}{2\pi}\int_{-\infty}^{\infty}F(\omega)\,e^{j\omega t}\,d\omega, \quad F(\omega) = \int_{-\infty}^{\infty}f(t)\,e^{-j\omega t}\,dt} \quad\blacksquare$$

CTFT Pair (Fourier Transform Pair)

$$F(\omega) = \int_{-\infty}^{\infty}f(t)\,e^{-j\omega t}\,dt \quad \text{(Analysis / Forward Transform)}$$ $$f(t) = \frac{1}{2\pi}\int_{-\infty}^{\infty}F(\omega)\,e^{j\omega t}\,d\omega \quad \text{(Synthesis / Inverse Transform)}$$

Core Properties and Engineering Significance

Property	Time Domain	Frequency Domain	Engineering Significance
Time Shift	$f(t-t_0)$	$e^{-j\omega t_0}F(\omega)$	Delay = phase rotation, amplitude unchanged
Frequency Shift	$e^{j\omega_0 t}f(t)$	$F(\omega-\omega_0)$	Modulation = shifting spectrum to carrier frequency
Scaling	$f(at)$	$\frac{1}{\|a\|}F(\omega/a)$	Time compression ↔ frequency expansion (uncertainty principle)
Convolution	$f*g$	$F\cdot G$	LTI filtering = frequency-domain multiplication (core of filter design)
Multiplication	$f\cdot g$	$\frac{1}{2\pi}F*G$	Window truncation = frequency-domain convolution (source of leakage)
Differentiation	$f'(t)$	$j\omega F(\omega)$	High-pass effect: high frequencies are amplified
Parseval	$\int\|f\|^2 dt = \frac{1}{2\pi}\int\|F\|^2 d\omega$		Time-domain energy = frequency-domain energy (energy conservation)

Convolution Theorem Proof

$$\mathcal{F}\{f*g\} = \int\left[\int f(\tau)g(t-\tau)d\tau\right]e^{-j\omega t}dt$$

Swapping the order of integration, let $u = t - \tau$:

$$= \int f(\tau)e^{-j\omega\tau}\left[\int g(u)e^{-j\omega u}du\right]d\tau = F(\omega)\cdot G(\omega) \quad\blacksquare$$

Engineering significance: The output of an LTI system $y = x * h$ becomes $Y = X \cdot H$ in the frequency domain — this is why we can describe filters using the frequency response $H(\omega)$. Multiplication is much simpler than convolution.

How to Use: Common Transform Pairs Quick Reference

$f(t)$	$F(\omega)$	Memory Aid
$\text{rect}(t/\tau)$	$\tau\,\text{sinc}(\omega\tau/2\pi)$	Rectangular window ↔ sinc leakage
$e^{-\alpha\|t\|}$	$\frac{2\alpha}{\alpha^2+\omega^2}$	Exponential decay ↔ Lorentzian
$e^{-\alpha t^2}$	$\sqrt{\pi/\alpha}\,e^{-\omega^2/(4\alpha)}$	Gaussian ↔ Gaussian
$\delta(t)$	$1$	Impulse contains all frequencies
$1$	$2\pi\delta(\omega)$	DC ↔ zero-frequency delta
$e^{j\omega_0 t}$	$2\pi\delta(\omega-\omega_0)$	Pure tone ↔ spectral line

Applications

Filter design (convolution theorem): Want to design a 1 kHz lowpass filter? Draw an ideal rectangular passband $H(\omega)$ in the frequency domain, inverse FT to get impulse response $h(t) = \text{sinc}$ — then truncate with a window. FM radio IF filters (200 kHz bandwidth) are designed this way.
AM modulation (frequency shift property): $x(t)\cos(\omega_c t) = \frac{1}{2}x(t)e^{j\omega_c t} + \frac{1}{2}x(t)e^{-j\omega_c t}$. Frequency domain: baseband spectrum is shifted to $\pm\omega_c$. AM broadcast uses carrier frequencies of 540-1600 kHz.
Energy calculation (Parseval's theorem): Compute the energy of a UWB (ultra-wideband) pulse within the 3.1-10.6 GHz band: $E = \frac{1}{2\pi}\int_{2\pi\cdot3.1G}^{2\pi\cdot10.6G}|F(\omega)|^2\,d\omega$.

Interactive: rect $\leftrightarrow$ sinc

Adjust the width $\tau$ of the rectangular pulse. Observe: narrower pulse → wider sinc main lobe (a direct manifestation of the uncertainty principle).

$\tau$ = 1.0

Chart will display automatically after adjusting parameters above

Pitfalls & Limitations

Where does $1/(2\pi)$ go? Different textbooks use different conventions. This platform uses the $\omega$-convention: forward transform has no $1/(2\pi)$, inverse transform has it. The $f$-convention has neither, but the exponent is $e^{-j2\pi ft}$. Don't mix conventions!
CTFT only applies to finite-energy ($L^1$ or $L^2$) signals: The CTFT of $\cos(\omega_0 t)$ requires distribution theory (see Section 1.2).
The convolution theorem requires both functions to be integrable: For distributions or periodic signals, convolution must be defined more carefully.

References: [1] Oppenheim & Willsky, Signals and Systems, Ch.4. [2] Bracewell, The Fourier Transform and Its Applications, McGraw-Hill. [3] Papoulis, The Fourier Integral and Its Applications.

✅ Quick Check

Q1: What is the greatest engineering significance of the convolution theorem? State it in one sentence.

Show answer

Filtering = frequency-domain multiplication. By designing the shape of H(ω), you can selectively preserve or remove any frequency component.

Q2: What does a time-domain delay t₀ correspond to in the frequency domain?

Show answer

Multiplication by e^{-jωt₀} — magnitude unchanged, only phase rotation. This is why a linear-phase filter is equivalent to pure delay.

Interactive: Common CTFT Pairs

Choose a signal and observe its time- and frequency-domain representations. Try different widths to understand the time–frequency reciprocity.

Signal:

Width parameter = 1.0

← 2.1 FS 2.3 DTFT →

2.3 Discrete-Time Fourier Transform (DTFT)

The bridge connecting the continuous and discrete worlds

Why does this matter? Because the DTFT is the bridge connecting the analog world (CTFT) and the digital world (DFT). Without understanding that "sampling causes spectral periodization," you cannot truly understand why aliasing occurs or what DFT results represent.

Previously... The CTFT from 2.2 is a tool for the continuous world. But computers can only process discrete number sequences. After sampling an analog signal into a digital signal, how does the spectrum change?

One-line summary: After sampling an analog signal into a digital signal, its spectrum becomes periodically repeated — the DTFT is the tool that describes this post-sampling world.

Learning Objectives

Define the DTFT and understand the $2\pi$ periodicity of its frequency domain
Derive the DTFT from CTFT + sampling (spectral periodization)
Distinguish between DTFT (continuous frequency) and DFT (discrete frequency sampling)
Understand the true role of zero-padding

The Problem: How Does the Spectrum Change After Sampling?

You used an ADC to digitize an analog signal at sampling rate $f_s = 48$ kHz. Now you have a sequence of numbers $x[0], x[1], x[2], \ldots$.

What is the "spectrum" of this sequence? How does it relate to the original analog signal's spectrum?
Why is the FFT output periodic with period $f_s$?
Are the DTFT and DFT the same thing? If not, what's the difference?

The DTFT is the theoretical foundation for understanding digital signal processing. DFT/FFT is its practical computational version.

Historical context: The concept of DTFT was formalized alongside the rise of digital computation. In the 1960s, with the proliferation of A/D converters and digital computers, engineers needed a complete "discrete world" Fourier theory. The DTFT filled the theoretical gap between CTFT (purely continuous) and DFT (purely discrete, finite-length), and was systematically organized by Oppenheim, Schafer, and others in 1970s textbooks.

Principles: Definition and Periodicity

Intuition first: Sampling is like viewing a spinning wheel with a strobe light. If the strobe frequency is not high enough, the wheel appears to "spin backward" — this is because the spectrum undergoes periodic repetition (aliasing). The DTFT precisely describes this phenomenon.

DTFT Definition

$$X(e^{j\omega}) = \sum_{n=-\infty}^{\infty}x[n]\,e^{-j\omega n}$$

$X(e^{j\omega})$ is a continuous function of $\omega$, periodic with period $2\pi$.

Inverse transform:

$$x[n] = \frac{1}{2\pi}\int_{-\pi}^{\pi}X(e^{j\omega})\,e^{j\omega n}\,d\omega$$

Why Is It $2\pi$-Periodic?

Because $e^{-j(\omega+2\pi)n} = e^{-j\omega n}\cdot e^{-j2\pi n} = e^{-j\omega n}\cdot 1 = e^{-j\omega n}$. Discrete sampling inherently cannot distinguish frequency $\omega$ from $\omega + 2\pi$ — this is the mathematical root of aliasing.

💡 Intuition: Why must discrete-time frequency be periodic?

Consider two discrete complex exponentials $e^{j\omega n}$ and $e^{j(\omega+2\pi)n}$:

$$e^{j(\omega+2\pi)n} = e^{j\omega n}\cdot e^{j2\pi n} = e^{j\omega n}\cdot 1 = e^{j\omega n}$$

Because $e^{j2\pi n} = 1$ for every integer $n$.

Conclusion: Frequencies $\omega$ and $\omega+2\pi$ are indistinguishable in discrete time — they produce exactly the same sample values. So the DTFT's frequency axis is naturally $2\pi$-periodic.

This also explains why the Nyquist limit exists: when an analog signal's frequency exceeds $f_s/2$, this $2\pi$ wrap-around causes it to be "aliased" back into the low-frequency region.

Relationship with CTFT: Sampling → Periodization

If $x[n] = x_c(nT_s)$ (sampling), then:

$$X(e^{j\omega}) = \frac{1}{T_s}\sum_{k=-\infty}^{\infty}X_c\!\left(\frac{\omega - 2\pi k}{T_s}\right)$$

Sampling causes spectral periodization. If the bandwidth of $X_c$ exceeds $\pi/T_s$ (the Nyquist frequency), adjacent copies overlap → aliasing, which is irreversible.

DTFT vs DFT: Key Differences

The DFT is a uniform sampling of the DTFT on the frequency axis:

$$X[k] = X(e^{j\omega})\big|_{\omega = 2\pi k/N} = \sum_{n=0}^{N-1}x[n]\,e^{-j2\pi kn/N}$$

Property	DTFT	DFT
Input	Infinite-length sequence $x[n]$	Finite-length $N$-point sequence
Output	Continuous function $X(e^{j\omega})$	$N$ discrete values $X[k]$
Frequency resolution	Continuous (infinite resolution)	$\Delta f = f_s/N$
Computability	Theoretical tool	FFT enables fast computation

Key insight: The DTFT gives the complete continuous spectrum; the DFT merely takes $N$ equally spaced samples from it. Zero-padding increases the DFT's sampling density (revealing more detail of the DTFT), but does not change the DTFT itself — zero-padding does not improve frequency resolution, it only improves the "display resolution" of the spectrum.

How to Use: Understanding DFT Results Through the DTFT

Understand the meaning of DFT bins: $X[k]$ is a sample of the DTFT $X(e^{j\omega})$ at $\omega = 2\pi k/N$. The corresponding physical frequency is $f_k = k \cdot f_s / N$.
Decide whether zero-padding is needed: If fine details of the DTFT between two peaks are missed by the DFT sampling, increase $N$ (via zero-padding or by collecting more data).
Distinguish "true resolution" from "display resolution": True resolution is determined by the data length ($\Delta f = f_s / N_{data}$). Zero-padding only improves interpolation, not resolution.

Applications

FIR filter frequency response: The frequency response of an FIR filter $h[n]$ (finite length $M$) is simply its DTFT: $H(e^{j\omega}) = \sum_{n=0}^{M-1}h[n]e^{-j\omega n}$. Using the DFT ($N \gg M$, with zero-padding) lets you plot the frequency response curve at high density.
Frequency resolution in spectrum analysis: Analyzing audio sampled at $f_s = 48$ kHz and wanting $\Delta f = 1$ Hz resolution requires $N = f_s/\Delta f = 48000$ points, i.e., at least 1 second of data.
CIC filter droop analysis: The DTFT of a CIC (Cascaded Integrator-Comb) filter is $|H(e^{j\omega})| = |\frac{\sin(M\omega/2)}{M\sin(\omega/2)}|^K$. The DTFT enables precise analysis of passband droop.

Interactive: DTFT vs DFT

Blue solid line = DTFT (approximated by a high-density DFT), red dots = $N$-point DFT samples. Increasing $N$ reveals more detail of the DTFT — but the DTFT itself does not change.

DFT length N = 16

Chart will display automatically after adjusting parameters above

Pitfalls & Common Misconceptions

"Zero-padding improves resolution" — This is the most common misconception. Zero-padding lets the DFT take more samples of the DTFT (a smoother spectral curve), but the DTFT itself is entirely determined by the original data. You cannot create new information from zeros.
Units of $\omega$: In the DTFT, $\omega$ is normalized angular frequency (radians/sample), ranging over $[-\pi, \pi]$. The corresponding physical frequency is $f = \omega f_s/(2\pi)$. $\omega = \pi$ corresponds to the Nyquist frequency $f_s/2$.
DTFT existence: $x[n]$ must be absolutely summable ($\sum|x[n]| < \infty$) or at least square-summable. The DTFT of an infinite-length periodic sequence (e.g., $\cos(\omega_0 n)$) requires distribution theory ($\delta$ functions appear in the frequency domain).

References: [1] Oppenheim & Schafer, Discrete-Time Signal Processing, Ch.2-5. [2] Proakis & Manolakis, Digital Signal Processing, Ch.4.

✅ Quick Check

Q1: What is the key difference between the DTFT and the DFT?

Show answer

The DTFT has a continuous frequency axis (ω takes all values), while the DFT takes N equally spaced samples on the frequency axis. DFT = sampled DTFT.

Q2: Why does sampling cause spectral periodization?

Show answer

Sampling = multiplication by an impulse train; in the frequency domain this becomes convolution with an impulse train = periodic repetition of the spectrum.

← 2.2 CTFT 2.4 DFT/FFT →

2.4 DFT & FFT

Discrete Fourier Transform & Fast Algorithm

Why does this matter? Because DFT/FFT is the computation you actually run on a computer — all the preceding theory ultimately lands through the FFT. Understanding circular vs. linear convolution and the true role of zero-padding is key to avoiding FFT misuse.

Previously... The DTFT from 2.3 gives a continuous spectrum — but computers cannot store continuous functions. We need to discretize the frequency axis as well, and that is the DFT. The FFT then makes the DFT fast enough for real-time computation.

One-line summary: The DFT is the version a computer can actually compute; the FFT is the algorithm that makes it fast — reducing complexity from $O(N^2)$ to $O(N\log N)$.

Learning Objectives

DFT matrix perspective: $\mathbf{X} = \mathbf{W}_N\mathbf{x}$, unitary property
Understand the difference between circular convolution and linear convolution
Cooley-Tukey radix-2 divide-and-conquer derivation
Choose $N$ correctly; understand the role and misconceptions of zero-padding

The Problem: The World Before 1965

Imagine the era before the FFT:

$N$	Direct DFT (multiplications)	FFT (multiplications)	Speedup
1,024	1,048,576	5,120	205x
4,096	16,777,216	24,576	683x
1,048,576	$1.1 \times 10^{12}$	$10,485,760$	104,858x

Before 1965, a single 1024-point spectrum analysis took several minutes on the computers of the day. The invention of the FFT reduced that same computation to milliseconds, directly enabling all modern digital signal processing applications.

Historical context: James Cooley and John Tukey published their landmark paper An algorithm for the machine calculation of complex Fourier series in 1965. The backdrop was Cold War nuclear test monitoring: the US needed to analyze seismic station data to detect Soviet underground nuclear tests. The massive demand for Fourier analysis gave birth to the FFT. Interestingly, Carl Friedrich Gauss had invented a similar algorithm as early as 1805 (for computing asteroid orbits), but his manuscript was not discovered until 1866 and, being written in Latin, was long overlooked.

Principles: DFT Matrix Perspective

Intuition first: The DFT multiplies an $N$-dimensional vector (time-domain signal) by a special $N \times N$ matrix (the twiddle-factor matrix) to produce another $N$-dimensional vector (frequency domain).

$$X[k] = \sum_{n=0}^{N-1}x[n]\,W_N^{kn}, \quad W_N = e^{-j2\pi/N}$$ $$\mathbf{X} = \mathbf{W}_N\mathbf{x}, \quad [\mathbf{W}_N]_{kn} = W_N^{kn}$$

$\frac{1}{\sqrt{N}}\mathbf{W}_N$ is a unitary matrix: $\mathbf{W}_N^H\mathbf{W}_N = N\mathbf{I}$

Inverse DFT: $x[n] = \frac{1}{N}\sum_{k=0}^{N-1}X[k]\,W_N^{-kn}$, i.e., $\mathbf{x} = \frac{1}{N}\mathbf{W}_N^H\mathbf{X}$.

Expand derivation: Conjugate symmetry of the DFT for real signals

Theorem: If $x[n]$ is a real-valued sequence, its DFT satisfies $X[N-k] = X^*[k]$ (conjugate symmetry).

Derivation:

$$X[N-k] = \sum_{n=0}^{N-1} x[n]\, e^{-j2\pi(N-k)n/N}$$ $$= \sum_{n=0}^{N-1} x[n]\, e^{-j2\pi n}\, e^{j2\pi kn/N}$$

Since $e^{-j2\pi n} = 1$ (for any integer $n$):

$$= \sum_{n=0}^{N-1} x[n]\, e^{j2\pi kn/N}$$

Because $x[n]$ is real, $x[n] = x^*[n]$:

$$= \sum_{n=0}^{N-1} x^*[n]\, e^{j2\pi kn/N} = \left(\sum_{n=0}^{N-1} x[n]\, e^{-j2\pi kn/N}\right)^* = X^*[k] \quad\blacksquare$$

Practical implications:

The DFT of a real signal is fully determined by the first half $X[0], X[1], \ldots, X[N/2]$
Computation and storage can be cut in half (real-FFT algorithms)
$X[0]$ is real (DC), and $X[N/2]$ is also real (Nyquist bin)
$|X[k]| = |X[N-k]|$ (magnitude spectrum is symmetric); $\angle X[k] = -\angle X[N-k]$ (phase spectrum is antisymmetric)

Circular Convolution vs Linear Convolution

$$\text{DFT}\{x \circledast y\} = X[k]\cdot Y[k]$$

The DFT corresponds to circular convolution (the tail wraps around to the head), not linear convolution.

Type	$x$ length $M$, $y$ length $L$	Result length	Required DFT size $N$
Linear convolution	$M + L - 1$	$M + L - 1$	$N \geq M + L - 1$
Circular convolution	$\max(M, L)$	$N$	$N$ (but may alias)

Key point: To compute linear convolution via the DFT, you must zero-pad to $N \geq M + L - 1$. Otherwise the circular convolution's "wrap-around" will corrupt the result. This is the theoretical foundation of the overlap-add (OLA) and overlap-save (OLS) methods.

Radix-2 FFT: Divide and Conquer

Cooley-Tukey Radix-2 Divide-and-Conquer Derivation

Assume $N = 2^m$. Split the DFT into even-indexed and odd-indexed terms:

$$X[k] = \sum_{n=0}^{N-1}x[n]W_N^{kn} = \underbrace{\sum_{r=0}^{N/2-1}x[2r]W_N^{2rk}}_{A[k]} + W_N^k\underbrace{\sum_{r=0}^{N/2-1}x[2r+1]W_N^{2rk}}_{B[k]}$$

Note that $W_N^{2rk} = W_{N/2}^{rk}$ (since $e^{-j2\pi\cdot 2r k/N} = e^{-j2\pi rk/(N/2)}$).

So $A[k]$ and $B[k]$ are each $N/2$-point DFTs!

$$X[k] = A[k] + W_N^k\,B[k], \quad k = 0, 1, \ldots, N/2-1$$ $$X[k+N/2] = A[k] - W_N^k\,B[k] \quad (\text{since } W_N^{k+N/2} = -W_N^k)$$

This is the butterfly operation.

Complexity: $N/2$ butterflies per stage, $\log_2 N$ stages total $\to$ $O(N\log N)$. $\;\blacksquare$

// Radix-2 Cooley-Tukey FFT (core engine of this platform) function fft(re_in, im_in) { const N = re_in.length; if (N <= 1) return { re: [...re_in], im: [...im_in] }; const re = new Float64Array(N), im = new Float64Array(N); const bits = Math.round(Math.log2(N)); for (let i = 0; i < N; i++) { // Bit-Reversal let r = 0; for (let b = 0; b < bits; b++) r = (r << 1) | ((i >> b) & 1); re[r] = re_in[i]; im[r] = im_in[i]; } for (let s = 2; s <= N; s *= 2) { // Butterfly const h = s/2, a = -2*Math.PI/s; for (let i = 0; i < N; i += s) for (let j = 0; j < h; j++) { const wr=Math.cos(a*j), wi=Math.sin(a*j); const tr=wr*re[i+j+h]-wi*im[i+j+h]; const ti=wr*im[i+j+h]+wi*re[i+j+h]; re[i+j+h]=re[i+j]-tr; im[i+j+h]=im[i+j]-ti; re[i+j]+=tr; im[i+j]+=ti; } } return { re: Array.from(re), im: Array.from(im) }; }

How to Use: Choosing $N$ and Frequency Mapping

Step 1: Choose $N$

$$\Delta f = \frac{f_s}{N} \quad \text{(Frequency resolution)}$$

Need $\Delta f = 1$ Hz with $f_s = 48000$ Hz $\to$ $N = 48000$. FFT is most efficient when $N = 2^m$, so choose $N = 2^{16} = 65536$ ($\Delta f \approx 0.73$ Hz).

Step 2: Map FFT bins to physical frequencies

$$f_k = \frac{k \cdot f_s}{N}, \quad k = 0, 1, \ldots, N-1$$

$k = 0$: DC, $k = N/2$: Nyquist, $k > N/2$: negative frequencies ($f_k - f_s$)

Step 3: Correctly understanding zero-padding

What zero-padding does	What it does NOT do
Increases DFT sampling density (smoother spectral curve)	Improve true frequency resolution
Makes $N$ a power of 2 (most efficient FFT)	Increase the information content of the signal
Prevents circular convolution aliasing ($N \geq M+L-1$)	Reduce noise

Python Example: Compute FFT and Plot Spectrum

# Python: compute FFT and plot spectrum import numpy as np import matplotlib.pyplot as plt # Generate test signal (50 Hz + 120 Hz) fs = 1000 # sampling rate 1 kHz T = 1.0 # duration 1 second N = int(fs * T) # 1000 samples t = np.arange(N) / fs x = np.sin(2*np.pi*50*t) + 0.5*np.sin(2*np.pi*120*t) # Compute FFT X = np.fft.fft(x) freqs = np.fft.fftfreq(N, d=1/fs) # Take single-sided spectrum (for real signals) half = N // 2 mag = np.abs(X[:half]) * 2/N # multiply by 2 to compensate for symmetric energy mag[0] /= 2 # DC does not need the factor of 2 phase = np.angle(X[:half]) # Plot fig, axes = plt.subplots(3, 1, figsize=(10, 8)) axes[0].plot(t[:200], x[:200]); axes[0].set_title('Time domain') axes[1].plot(freqs[:half], mag); axes[1].set_xlim(0, 200) axes[1].set_title('Magnitude spectrum |X[k]|'); axes[1].grid() axes[2].plot(freqs[:half], phase); axes[2].set_xlim(0, 200) axes[2].set_title('Phase spectrum angle(X[k])') plt.tight_layout(); plt.show()

Applications

Audio spectrum analyzer: $f_s = 44100$ Hz, $N = 4096$ $\to$ $\Delta f = 10.8$ Hz. Sufficient to resolve adjacent piano keys (e.g., A4 = 440 Hz vs. A#4 = 466 Hz, a difference of 26 Hz).
OFDM communications: 802.11ax (Wi-Fi 6) uses a 1024-point FFT with subcarrier spacing of 78.125 kHz. Each FFT bin corresponds to one subcarrier.
Real-time spectrum analyzer: The Keysight RSA uses an $N = 2^{22}$ FFT, achieving $\Delta f \approx 26$ Hz resolution over a 110 MHz bandwidth, computed hundreds of times per second.

Pitfalls & Common Errors

Getting FFT bin frequencies wrong: The most common mistake is treating $k = N-1$ as the highest frequency. In fact, $k > N/2$ corresponds to negative frequencies. For real-valued input, you only need $k = 0, \ldots, N/2$.
Zero-padding $\neq$ improving resolution: Zero-padding 100 data points to 1024 points still gives a resolution of $f_s/100$, not $f_s/1024$.
Circular convolution $\neq$ linear convolution: Forgetting to zero-pad and directly using FFT for convolution produces wrap-around artifacts.
Endianness and normalization: Different FFT libraries (FFTW, NumPy, MATLAB) use different normalization conventions. Some divide by $N$ in the forward transform, others in the inverse. Verify by checking whether Parseval's identity holds.

References: [1] Cooley & Tukey, An algorithm for the machine calculation of complex Fourier series, Math. Comp., 1965. [2] Oppenheim & Schafer, Discrete-Time Signal Processing, Ch.8-9. [3] Van Loan, Computational Frameworks for the FFT, SIAM.

📝 Worked Example

You have a 0.5-second audio clip sampled at 8000 Hz. (a) How many sample points total? (b) What is the FFT frequency resolution Δf? (c) If you need to see frequency details down to 0.5 Hz, how long must the observation time be?

Show solution

(a) N = 0.5 × 8000 = 4000 points

(b) Δf = f_s/N = 8000/4000 = 2 Hz

(c) Δf = 1/T → T = 1/0.5 = 2 seconds (zero-padding cannot substitute for this)

✅ Quick Check

Q1: How much faster is a 1024-point FFT compared to a direct DFT?

Show answer

DFT: N² = 1,048,576 multiplications. FFT: (N/2)log₂N = 5,120. Speedup ≈ 205x.

Q2: Does zero-padding to 4096 points improve the true frequency resolution?

Show answer

No. True resolution depends only on the observation time: Δf = 1/T. Zero-padding only makes the frequency axis denser (interpolation) and does not add new information.

Interactive: The True Effect of Zero-Padding

Signal: 50 Hz + 55 Hz, sampling rate 500 Hz, observation time 0.1 s (50 points). Zero-padding will not separate the two peaks — it only makes the spectrum smoother.

Zero-padding factor:

Chart will display automatically after adjusting parameters above

Decimation-in-Time vs Decimation-in-Frequency

The Cooley-Tukey FFT has two equivalent but distinct decompositions. This platform demonstrates DIT (Decimation-in-Time).

DIT — Decimation in Time

Split the input sequence into even and odd halves:

$$X[k] = \sum_{r=0}^{N/2-1}x[2r]W_{N/2}^{rk} + W_N^k\sum_{r=0}^{N/2-1}x[2r+1]W_{N/2}^{rk}$$

Input requires bit-reversal ordering
Output is in natural order
Butterfly structure: multiply then add

DIF — Decimation in Frequency

Split the output into even and odd groups:

$$X[2k] = \sum_{n=0}^{N/2-1}\left[x[n]+x[n+N/2]\right]W_{N/2}^{nk}$$ $$X[2k+1] = \sum_{n=0}^{N/2-1}\left[x[n]-x[n+N/2]\right]W_N^n W_{N/2}^{nk}$$

Input is in natural order
Output requires bit-reversal ordering
Butterfly structure: add then multiply

Practical choice:

Both have exactly the same operation count: $\frac{N}{2}\log_2 N$ complex multiplications
Most modern FFT libraries (FFTW, MKL) support both
The choice is usually dictated by "how upstream/downstream stages order their data," to avoid extra bit-reversal
DIT is more intuitive (from the butterfly-diagram perspective); DIF is its transpose

⚠ Common FFT Implementation Pitfalls

These are the traps engineers most frequently fall into when actually using numpy.fft.fft or scipy.fft.

Pitfall 1: Forgetting Normalization

numpy's fft performs no normalization by default. For an N-point amplitude spectrum, multiply by 2/N (single-sided):

X = np.fft.fft(x) half = N // 2 mag = np.abs(X[:half]) * 2 / N # correct mag[0] /= 2 # DC should not be doubled mag[-1] /= 2 # Nyquist should not be doubled (if N is even)

Pitfall 2: Misusing fftfreq

The physical frequency corresponding to index k is $f_k = k \cdot f_s / N$, not $k$ itself:

freqs = np.fft.fftfreq(N, d=1/fs) # correct: handles negative frequencies automatically # Do NOT write freqs = np.arange(N) * fs / N (wrong — the positive/negative symmetry breaks)

Pitfall 3: Spectral Leakage

If the signal frequency is not an integer multiple of $f_s/N$ (i.e., not centered in a bin), energy leaks into neighboring bins. Always apply a window:

from scipy.signal import windows win = windows.hann(N) X = np.fft.fft(x * win) # apply Hann window mag = np.abs(X[:N//2]) * 2 / np.sum(win) # normalize by the window sum

Pitfall 4: Zero-Padding ≠ Higher Resolution

Zero-padding only makes DFT bins denser (an interpolation effect) — it does not increase the true frequency resolution. True resolution is determined by the observation time $T$: $\Delta f = 1/T$.

Pitfall 5: DC Drift

If the signal has a DC offset, the FFT will show a huge peak at k=0 that may mask low-frequency components you care about. Remove the mean first:

x = x - np.mean(x) # remove DC

Pitfall 6: Complex vs Real FFT

For real-valued signals, use np.fft.rfft() — it's roughly 2x faster than fft() (only computes half) thanks to conjugate symmetry.

← 2.3 DTFT 2.5 Z-Transform →

2.5 Z-Transform (Z-Transform)

A unified analysis framework for discrete-time systems

Why does this matter? Because the Z-transform is the universal tool for discrete-system analysis. Determining whether a filter is stable, reading the frequency response from a pole-zero plot, designing IIR filters — all rely on the Z-transform. Its role in discrete systems is equivalent to that of the Laplace transform in continuous systems.

Previously... The DFT/FFT from 2.4 is a computational tool. But to analyze discrete-system stability and design digital filters, we need a more powerful framework — the Z-transform.

One-line summary: The Z-transform is the "universal tool" for digital systems — stability, frequency response, and filter design all depend on it.

Learning Objectives

Define the Z-transform and its ROC (Region of Convergence)
Understand that DFT = Z-transform sampled on the unit circle
Use pole-zero analysis to determine stability and frequency response
Design simple IIR filters

The Problem: Core Questions About Digital Filters

You have designed a digital filter whose difference equation is:

$$y[n] = 0.5\,y[n-1] + x[n] - 0.3\,x[n-1]$$

Is it stable? (Will the output blow up?)
What does its frequency response look like? (Which frequencies are amplified, which are attenuated?)
If I want to completely eliminate 1 kHz (notch), how should I modify it?

These questions are hard to answer using the time-domain difference equation. The Z-transform turns the difference equation into an algebraic equation, making everything clear.

Historical context: The Z-transform is the discrete counterpart of the Laplace transform. In continuous systems, the Laplace transform converts differential equations into algebraic equations, with the variable $s$ living in the complex plane. The Z-transform does the same thing for discrete systems: the variable $z$ also lives in the complex plane, and $z = e^{sT_s}$ maps the $s$-plane to the $z$-plane. Lotfi Zadeh (later famous for fuzzy logic) and John Ragazzini introduced the modern form of the Z-transform in 1952.

Principles: Definition and Region of Convergence

Intuition first: The DTFT evaluates a sequence on the frequency axis ($e^{j\omega}$, the unit circle). The Z-transform extends this evaluation from the unit circle to the entire complex plane. This extra degree of freedom lets us analyze stability, causality, and other properties that the DTFT alone cannot directly reveal.

Z-transform Definition

$$X(z) = \sum_{n=-\infty}^{\infty}x[n]\,z^{-n}, \quad z \in \mathbb{C}$$

When $z = e^{j\omega}$ (unit circle): $X(e^{j\omega}) = \text{DTFT}$

When $z = e^{j2\pi k/N}$ ($N$ equally spaced points on the unit circle): $X[k] = \text{DFT}$

Region of Convergence (ROC)

The set of $z$ values for which $\sum|x[n]||z|^{-n}$ converges. The shape of the ROC determines the nature of the sequence:

Sequence Type	ROC Shape	Example
Finite-length (FIR)	Entire $z$-plane (possibly excluding $z=0$ or $z=\infty$)	$x[n] = \delta[n] - 0.5\delta[n-1]$
Causal right-sided sequence	Exterior of a circle: $\|z\| > r_{\max}$	$x[n] = a^n u[n]$, ROC: $\|z\|>\|a\|$
Anti-causal left-sided sequence	Interior of a circle: $\|z\| < r_{\min}$	$x[n] = -a^n u[-n-1]$, ROC: $\|z\|<\|a\|$

Stability criterion: A causal LTI system is stable $\iff$ its ROC contains the unit circle $\iff$ all poles lie inside the unit circle ($|p_i| < 1$). This is the sole criterion for digital filter stability.

Pole-Zero Analysis

For a rational transfer function (IIR filter):

$$H(z) = \frac{B(z)}{A(z)} = \frac{b_0 + b_1 z^{-1} + \cdots + b_M z^{-M}}{1 + a_1 z^{-1} + \cdots + a_N z^{-N}} = b_0\frac{\prod_{k=1}^{M}(1 - q_k z^{-1})}{\prod_{k=1}^{N}(1 - p_k z^{-1})}$$

$q_k$: zeros, $p_k$: poles

Intuition for the frequency response: $H(e^{j\omega})$ is obtained by walking once around the unit circle and looking at how far each position is from the poles and zeros.

Pole close to the unit circle → that frequency is amplified (resonance). The closer the pole, the sharper the peak.
Zero close to the unit circle → that frequency is attenuated (notch). A zero on the unit circle = complete elimination.
Pole outside the unit circle → the system is unstable.

Why do poles cause resonance?

Near $\omega = \theta_p$ (the pole angle):

$$|H(e^{j\omega})| \approx \frac{|b_0|\prod|e^{j\omega}-q_k|}{\prod_{k\neq i}|e^{j\omega}-p_k| \cdot |e^{j\omega}-p_i|}$$

When $\omega \approx \theta_p$, $|e^{j\omega} - p_i| = |e^{j\omega} - |p_i|e^{j\theta_p}| \approx 1 - |p_i|$ (small).

Therefore $|H| \approx \frac{C}{1-|p_i|}$. As $|p_i| \to 1$, the gain tends to infinity.

The 3-dB bandwidth of the peak is $\approx 2(1-|p_i|)$ radians. $\;\blacksquare$

How to Use: Designing Filters from Pole-Zero Plots

Notch filter (eliminating a specific frequency): Place zeros at $z = e^{j\omega_0}$ and $z = e^{-j\omega_0}$, paired with nearby poles at $z = r\,e^{\pm j\omega_0}$ ($r < 1$, e.g., $r = 0.95$) to control the notch width.
Resonator (enhancing a specific frequency): Place poles at $z = r\,e^{\pm j\omega_0}$; the closer $r$ is to 1, the sharper the resonance. $Q$ factor $\approx \frac{\omega_0}{2(1-r)}$.
Stability check: Are all poles $|p_k| < 1$? If any pole lies on or outside the unit circle, the system is unstable. Use MATLAB's zplane(b,a) or Python's scipy.signal.tf2zpk.

Applications

IIR filter design: Butterworth, Chebyshev, and elliptic filters are designed with poles and zeros placed in the $s$-plane, then mapped to the $z$-plane via the bilinear transform $s = \frac{2}{T_s}\frac{1-z^{-1}}{1+z^{-1}}$. For example, a 5th-order Butterworth lowpass has 5 poles uniformly distributed on the left half of the $s$-plane circle.
Control-system stability: The closed-loop transfer function $H_{cl}(z)$ of a digital PID controller must have all poles inside the unit circle. If some pole has $|p| = 0.98$ (very close to but still inside), the system is stable but will exhibit slowly decaying oscillations.
Audio equalizer (EQ): Each band of a parametric EQ is a pair of conjugate poles plus a pair of conjugate zeros. The center frequency is set by the pole-zero angle, the $Q$ factor by the radius, and the gain by the pole-to-zero distance ratio.

Interactive: Pole-Zero Plot & Frequency Response

Adjust the pole position (conjugate pair $p = re^{\pm j\theta}$) and observe how the frequency response (= Z-transform evaluated on the unit circle) changes. The closer the radius is to 1, the sharper the resonance peak.

Pole radius r = 0.9

Pole angle $\theta$ = 1.0 rad

Chart will display automatically after adjusting parameters above

Pitfalls & Limitations

The ROC must be specified: The same algebraic expression $X(z)$ can correspond to different time-domain sequences depending on the ROC. For example, $X(z) = 1/(1-az^{-1})$: ROC $|z|>|a|$ $\to$ causal exponential $a^n u[n]$; ROC $|z|<|a|$ $\to$ anti-causal $-a^n u[-n-1]$.
FIR is always stable: FIR filters have no feedback poles (all poles are at $z=0$), so they are unconditionally stable. This is the biggest advantage of FIR over IIR.
Numerical precision: The closer the poles are to the unit circle, the more sensitive an IIR filter is to coefficient quantization. In 16-bit fixed-point implementations, poles with $|p| > 0.99$ may cause limit-cycle oscillations.

References: [1] Oppenheim & Schafer, Discrete-Time Signal Processing, Ch.3-6. [2] Proakis & Manolakis, Digital Signal Processing, Ch.3. [3] Mitra, Digital Signal Processing: A Computer-Based Approach.

✅ Quick Check

Q1: How do you tell from a pole-zero plot whether a frequency is amplified or attenuated?

Show answer

Pole close to some frequency on the unit circle → that frequency is amplified (resonance); zero close → that frequency is attenuated (notch).

Q2: What is the stability condition for an IIR filter?

Show answer

All poles lie inside the unit circle (|z|<1), equivalently the ROC contains the unit circle.

← 2.4 DFT/FFT 2.6 Sampling Theorem →

2.6 Sampling Theorem (Sampling Theorem)

Rigorously deriving Shannon's theorem from the Poisson summation formula

Why does this matter? Because sampling is the first step from analog to digital, and aliasing caused by a wrong sampling rate is an irreversible disaster. Why does CD audio use 44.1 kHz? Why does vibration analysis use 2.56x? Why does 5G oversample? All are direct applications of the Nyquist theorem.

Previously... So far we have assumed we already have a discrete sequence x[n]. But x[n] is obtained by sampling an analog signal x(t) — how high must the sampling rate be to avoid losing information?

One-line summary: If the sampling rate is not high enough, high frequencies disguise themselves as low frequencies — this is called aliasing, and it is irreversible.

Learning Objectives

Derive sampling = frequency-domain periodization (Poisson summation formula)
Derive the Nyquist condition $f_s \geq 2f_{\max}$ from periodization
Derive the sinc reconstruction (Whittaker-Shannon interpolation) formula
Understand the design of anti-aliasing filters in practical systems

The Problem: A Recording Engineer's Nightmare

Suppose you record a sound containing a 5000 Hz tone at $f_s = 8000$ Hz. On playback, what you hear is not 5000 Hz but 3000 Hz (= $8000 - 5000$).

5000 Hz exceeds the Nyquist frequency $f_s/2 = 4000$ Hz
It is "folded" (aliased) to $f_s - 5000 = 3000$ Hz
Irreversible: from the recorded data alone, you cannot distinguish between 3000 Hz and 5000 Hz

This is why every ADC must be preceded by an anti-aliasing filter. The sampling theorem tells you where to set the cutoff frequency of that filter.

Historical context: Harry Nyquist first proposed in his 1928 paper Certain Topics in Telegraph Transmission Theory that sampling at rate $f_s$ can represent signals with bandwidth at most $f_s/2$. Claude Shannon gave the full mathematical proof and the sinc reconstruction formula in his 1949 foundational information-theory paper. In Russia, V. A. Kotelnikov independently obtained the same result in 1933. This theorem is therefore sometimes called the Nyquist-Shannon-Kotelnikov sampling theorem.

Principles: Rigorous Derivation

Intuition first: Sampling is like stamping with a comb along the frequency axis — every $f_s$ Hz it prints a copy of the original spectrum. If the original spectrum is too wide, adjacent copies overlap (aliasing), just as stamps printed too densely blur the image.

From Poisson summation to Shannon's theorem

Step 1: Sampling = multiplication by an impulse train

$$x_s(t) = x(t)\cdot\sum_{n=-\infty}^{\infty}\delta(t - nT_s) = \sum_n x(nT_s)\,\delta(t-nT_s)$$

Step 2: Frequency domain

The FT of an impulse train is another impulse train: $\mathcal{F}\{\sum_n\delta(t-nT_s)\} = \frac{2\pi}{T_s}\sum_k\delta(\omega-k\omega_s)$, where $\omega_s = 2\pi f_s$.

Time-domain multiplication = frequency-domain convolution:

$$X_s(\omega) = \frac{1}{2\pi}X(\omega) * \frac{2\pi}{T_s}\sum_k\delta(\omega-k\omega_s) = \frac{1}{T_s}\sum_{k=-\infty}^{\infty}X(\omega - k\omega_s)$$

Sampling causes periodic repetition of the spectrum with spacing $\omega_s = 2\pi f_s$.

Step 3: Nyquist condition

If $X(\omega) = 0$ for $|\omega| > \omega_{\max}$ (band-limited signal) and $\omega_s > 2\omega_{\max}$, adjacent copies do not overlap. The original spectrum can be recovered using an ideal lowpass filter with gain $T_s$ and cutoff frequency $\omega_s/2$:

$$\boxed{f_s \geq 2f_{\max} \quad \text{(Nyquist Rate)}}$$

Step 4: Sinc reconstruction (Whittaker-Shannon interpolation)

The impulse response of an ideal lowpass filter is the sinc function. Therefore:

$$x(t) = \sum_{n=-\infty}^{\infty}x(nT_s)\,\text{sinc}\!\left(\frac{t-nT_s}{T_s}\right)$$

where $\text{sinc}(u) = \frac{\sin(\pi u)}{\pi u}$. Each sample "grows" a sinc waveform, and the superposition of all sincs exactly reconstructs the original continuous signal. $\;\blacksquare$

How to Use: Sampling Rate Design for Practical Systems

In theory $f_s \geq 2f_{\max}$ is enough. But in practice you need more, because:

Anti-aliasing filters are not ideal: An ideal brick-wall lowpass filter is unrealizable. Real analog LPFs have a transition band and you need margin.
Rules of thumb: $f_s \geq 2.56 \times f_{\max}$ (industry standard for vibration analysis), or the more conservative $f_s \geq 3{-}4 \times f_{\max}$.
Place the anti-aliasing filter before the ADC (in the analog domain), with cutoff set at $f_s/2$ or slightly below.

Application	$f_{\max}$	$f_s$ (practical)	Ratio	Notes
CD audio	20 kHz	44.1 kHz	2.2x	Human hearing limit 20 kHz
Professional audio	20 kHz	96 kHz	4.8x	Simplifies AAF design
Vibration analysis	$f_{max}$	$2.56 \times f_{max}$	2.56x	ISO/IEC standard
5G baseband	100 MHz	245.76 MHz	2.46x	3GPP standard sampling rate
Sigma-Delta ADC	$f_b$	$64{-}256 \times f_b$	64-256x	Oversampling trades for bit depth

Applications

Audio CD (44.1 kHz): The maximum perceivable frequency for the human ear is about 20 kHz. $44100/20000 = 2.205$. Why 44.1 instead of 40? To leave transition-band margin for the anti-aliasing filter. The historical reason for 44100 is related to NTSC video format.
Mechanical vibration analysis (2.56x): Monitoring turbine bearing fault frequencies. If the highest frequency of interest is 10 kHz, the sampling rate is 25.6 kHz, using an 8th-order Butterworth AAF with cutoff at 10 kHz. $2.56 = 2 \times 1.28$; the 1.28 margin lets the AAF attenuate by more than 60 dB at Nyquist.
Oversampling ADCs: Sigma-Delta ADCs sample at an extremely high rate (e.g., $256 \times f_b$), then use a digital decimation filter to bring the rate down to the target. The benefit: the anti-aliasing filter can be very simple (a single-pole RC is enough), because the transition band width is approximately $255 \times f_b$.

Interactive: Aliasing Demo

An 80 Hz sinusoid. Adjust the sampling rate $f_s$. When $f_s < 160$ Hz (below Nyquist), observe aliasing — the sampled signal looks like a different frequency.

$f_s$ = 200 Hz

Chart will display automatically after adjusting parameters above

Pitfalls & Limitations

"If the sampling rate is high enough, no anti-aliasing filter is needed" — Wrong. Any signal plus environmental noise has theoretically infinite bandwidth. Without an AAF, high-frequency noise aliases in.
"Sinc reconstruction gives exact recovery" — In theory yes, but the sinc function is infinitely long and unrealizable. Real systems use finite-length approximations (e.g., Lanczos kernel, polynomial interpolation).
Bandpass sampling: If the signal is bandpass (e.g., an RF signal in $f_c \pm B/2$), the sampling rate only needs $f_s \geq 2B$ (not $2f_c$). But the choice of $f_s$ must avoid overlap between spectral copies, requiring more careful calculation.
Aliasing is irreversible: Once aliasing has occurred, no post-processing (digital filtering, AI, etc.) can recover the original signal. Prevention is the only option.

References: [1] Shannon, Communication in the Presence of Noise, Proc. IRE, 1949. [2] Nyquist, Certain Topics in Telegraph Transmission Theory, Trans. AIEE, 1928. [3] Oppenheim & Schafer, Discrete-Time Signal Processing, Ch.4.

📝 Worked Example

A signal contains 100 Hz, 250 Hz, and 500 Hz components. (a) What is the minimum sampling rate? (b) If f_s=800 Hz, is there aliasing? (c) If f_s=900 Hz, where does the 500 Hz component alias to?

Show solution

(a) f_s ≥ 2×500 = 1000 Hz

(b) f_s=800 < 1000, so 500 Hz will alias. 250 Hz is unaffected (800 > 500).

(c) 500 Hz aliases to |500−900| = 400 Hz

✅ Quick Check

Q1: Why is the CD sampling rate 44.1 kHz?

Show answer

The human ear hears up to ~20 kHz, so Nyquist requires ≥40 kHz. 44.1 kHz provides ~10% margin for the anti-aliasing filter's transition band.

Q2: If you forgot the anti-aliasing filter before sampling, can you fix it afterwards?

Show answer

No. Aliasing is irreversible — high frequencies have already disguised themselves as low frequencies, and there is no way to distinguish genuine low frequencies from the aliased ones.

← 2.5 Z-Transform 2.7 Transform Relationships Overview →

2.7 Transform Relationships Overview

The complete relationship chain of FS / CTFT / DTFT / DFT / Z-Transform

Why does this matter? Because FS, CTFT, DTFT, DFT, and the Z-transform are not five independent tools but five perspectives on the same story. Once you understand their relationships, you will never get lost on the question "which transform should I use?"

Previously... We have learned five transforms (FS, CTFT, DTFT, DFT, Z). They look like different tools, but there are precise mathematical relationships between them — once you understand the relationship map, you will stop confusing them.

One-line summary: FS, CTFT, DTFT, DFT, Z-transform — not five independent tools, but five perspectives on the same story.

Learning Objectives

Understand the derivation relationships among the five transforms
Master the duality "sampling → periodization" and "truncation → discretization"
Choose the right transform tool based on signal characteristics

Relationship Chain: Five Branches of the Same Tree

Intuition first: All Fourier transforms do the same thing — decompose a signal into frequency components. The difference lies in whether the signal is continuous/discrete, periodic/aperiodic. Two core operations connect them:

Sampling $\to$ periodization in the frequency domain
Truncation / periodization $\to$ discretization in the frequency domain

Transform Relationship Diagram

FS continuous periodic → discrete	$\xrightarrow{T \to \infty}$ de-periodization	CTFT continuous aperiodic → continuous
		$\downarrow$ sampling frequency-domain periodization
		DTFT discrete aperiodic → continuous periodic	$\xleftarrow{z = e^{j\omega}}$ unit-circle sampling	Z-Transform discrete → $z$-plane
		$\downarrow$ truncate to $N$ points frequency-domain discretization
		DFT discrete periodic → discrete periodic

Complete Comparison Table

Transform	Time Domain	Frequency Domain	Connecting Operation	Formula
FS	continuous, period $T$	discrete ($c_n$)	—	$c_n = \frac{1}{T}\int_0^T f(t)e^{-jn\omega_0 t}dt$
CTFT	continuous, aperiodic	continuous	$T\to\infty$ (FS → CTFT)	$F(\omega) = \int f(t)e^{-j\omega t}dt$
DTFT	discrete, aperiodic	continuous, $2\pi$-periodic	sampling (CTFT → DTFT)	$X(e^{j\omega}) = \sum x[n]e^{-j\omega n}$
DFT	discrete, $N$-periodic	discrete, $N$-periodic	truncate to $N$ points (DTFT → DFT)	$X[k] = \sum_{n=0}^{N-1}x[n]e^{-j2\pi kn/N}$
Z-Transform	discrete	function on the $z$-plane	$z=e^{j\omega}$ → DTFT	$X(z) = \sum x[n]z^{-n}$

Key Insight: Duality Principle

Sampling ↔ periodization: Time-domain sampling (multiplication by an impulse train) causes frequency-domain periodization (spectral repetition). The converse also holds: time-domain periodization causes frequency-domain discretization (becoming discrete $c_n$).

Truncation ↔ discretization: Time-domain truncation (multiplication by a finite window) is equivalent to frequency-domain convolution (convolution with the window's FT, causing leakage). At the same time, the DTFT of a finite-length $N$ sequence can be fully represented by $N$ equally spaced samples — this is the DFT.

The DFT is the result of "double operations": both sampling (discrete in time domain) and truncation (finite length in time domain), so both time and frequency domains are discrete and periodic. This is why the DFT is the only version a computer can compute — because computers can only handle finitely many discrete numbers.

How to Use: Choosing the Right Transform Tool

Your signal / need	Use	Reason
Continuous periodic signal (e.g., 60 Hz mains)	FS	Discrete harmonic structure, THD analysis
Continuous transient signal (theoretical analysis)	CTFT	Property derivations, filter theory
Theoretical spectrum of a discrete signal	DTFT	FIR/IIR frequency-response analysis
Finite-length discrete data (actual computation)	DFT/FFT	Only version a computer can compute
System stability / pole-zero analysis	Z-Transform	ROC determines stability, poles determine resonance
$s$-domain analysis of continuous systems	Laplace	Continuous counterpart of Z

Practical rules of thumb:

Theoretical derivations $\to$ CTFT / DTFT / Z (pen-and-paper tools)
Writing code to compute spectra $\to$ DFT/FFT (the only computable version)
Analyzing filter stability $\to$ poles of the Z-transform
Interpreting the physical meaning of FFT results $\to$ back to DTFT and CTFT theory

"Translation" Examples Between Transforms

The following shows how the same physical problem moves between different transforms:

Scenario: Designing a digital lowpass filter

CTFT: Define the ideal lowpass $H(\omega) = \text{rect}(\omega/2\omega_c)$, inverse FT gives $h(t) = \text{sinc}$
Sampling: $h[n] = h(nT_s) = \text{sinc}(nT_s \omega_c/\pi)$ $\to$ enters the discrete world (DTFT)
Truncation: $h_w[n] = h[n] \cdot w[n]$ (window), length $M$ $\to$ finite-length FIR (DFT)
Z-transform: $H(z) = \sum_{n=0}^{M-1}h_w[n]z^{-n}$ (all zeros, no poles → FIR is always stable)
FFT: Use an $N$-point FFT to compute $H(e^{j2\pi k/N})$ and verify the frequency response

Common Confusions

DTFT $\neq$ DFT: The DTFT is a continuous frequency function (not directly computable); the DFT is its $N$-point sampling.
Z-transform $\neq$ DTFT: The DTFT is a special case of the Z-transform on the unit circle. The ROC information of the Z-transform is "invisible" in the DTFT.
FS $\neq$ DFT: FS is a continuous-time theory, while DFT is a finite-length discrete computation. But taking the DFT of a periodic signal over an integer number of periods yields a result equal to the FS coefficients multiplied by $N$.

References: [1] Oppenheim & Willsky, Signals and Systems, Ch.3-5. [2] Oppenheim & Schafer, Discrete-Time Signal Processing, Ch.2-8. [3] Haykin & Van Veen, Signals and Systems.

Interactive: Transform Relationship Diagram

Click a node or hover over it to see the derivation relationships among the transforms.

Hover over a node to see details.

← 2.6 Sampling Theorem 3.1 Periodogram & Windows →

3.1 Periodogram & Window Functions

The root cause of spectral leakage and the engineering trade-offs of windows

Why does this matter? Because computing an FFT without windowing is like looking at the spectrum through a pair of severely astigmatic glasses — the sidelobes from leakage will completely swamp weak signals. Proper use of windows is the first step toward reliable spectral analysis.

Previously: Part II gave us the transform tools. But directly feeding a signal into the FFT to compute its spectrum (periodogram) produces results severely distorted by truncation effects. Window functions are the first step in solving this problem.

Learning Objectives

Derive that truncation = frequency-domain convolution, and understand the cause of leakage
Compare performance metrics of Rectangular / Hann / Hamming / Blackman / Kaiser windows
Choose the optimal window based on resolution vs. dynamic range requirements
Understand the statistical properties of the periodogram as a PSD estimator

One-Sentence Summary

The periodogram is simply "feed the signal into an FFT and square the result" — simple but crude. Window functions are the key to making it less crude.

Pain Point: "Ghost Artifacts" in the Spectrum

You analyze a signal with the FFT — it contains only a single 100 Hz sine wave, yet the spectrum sprouts a bunch of ghost artifacts at 80, 90, 110, 120 Hz and beyond. This is spectral leakage.

A more serious scenario: you want to detect a weak 1.05 kHz signal next to a strong 1 kHz signal (e.g., harmonic distortion analysis), but the leakage sidelobes completely swamp the weak signal. This is not a hardware problem — it is a mathematical inevitability.

Origin

Arthur Schuster (1898) first proposed the concept of the periodogram for analyzing the periodicity of sunspot activity. He applied Fourier analysis directly to the observed data, computing the "intensity" at each frequency — a simple and intuitive idea, but with poor statistical properties (not fully understood until the mid-20th century).

Systematic study of window functions came with Fredric J. Harris (1978) and his classic paper "On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform". Harris systematically compared the frequency-domain characteristics of over 20 window functions and established engineering criteria for window selection. This paper remains one of the most cited in the field (Google Scholar citations exceeding 10,000).

Jim Kaiser at Bell Labs developed the Bessel-function-based Kaiser window, whose unique feature is that a single parameter $\beta$ continuously adjusts the trade-off between mainlobe width and sidelobe attenuation — turning window selection from "pick one from a pile of fixed windows" into "turn a knob to your desired balance point."

Principle: Why Does Truncation Cause Leakage?

Intuition: You only observe a finite duration of the signal. Mathematically, this is equivalent to multiplying an infinitely long signal by a rectangular function (1 inside the observation window, 0 outside). Multiplication in time = convolution in frequency. The spectrum of the rectangular function is a sinc (with infinitely many sidelobes), so the originally clean spectral line gets "smeared" by the sinc sidelobes — this is leakage.

Truncation = multiply by rectangular window = convolve with sinc in frequency

$$x_w[n] = x[n] \cdot w[n] \;\longleftrightarrow\; X_w(e^{j\omega}) = \frac{1}{2\pi}\,X(e^{j\omega}) * W(e^{j\omega})$$

The DTFT of the rectangular window is the Dirichlet kernel (discrete sinc):

$$W_{\text{rect}}(e^{j\omega}) = e^{-j\omega(N-1)/2}\,\frac{\sin(N\omega/2)}{\sin(\omega/2)}$$

Its first sidelobe is only -13 dB below the mainlobe. This means: if there is a strong signal, the energy it leaks into adjacent frequencies is only 13 dB below itself — this will completely swamp any nearby signal that is more than 13 dB weaker.

Expand: Why can other windows improve sidelobes?

All windows share the same core design philosophy: sacrifice mainlobe width to gain sidelobe attenuation.

The problem with the rectangular window is that it abruptly truncates to zero at the edges, causing the Gibbs phenomenon in the frequency domain (slow sidelobe decay). If the window function smoothly tapers to zero at the edges (like the cosine shape of the Hann window), the sidelobes in the frequency domain decay much faster.

Mathematically, the Hann window can be expressed as a linear combination of three rectangular windows:

$$w_{\text{Hann}}[n] = 0.5\,w_{\text{rect}}[n] - 0.25\,w_{\text{rect}}[n]\,e^{j2\pi n/(N-1)} - 0.25\,w_{\text{rect}}[n]\,e^{-j2\pi n/(N-1)}$$

Therefore, the DTFT of the Hann window is a superposition of three shifted Dirichlet kernels. In the sidelobe region, these three terms approximately cancel each other (destructive interference), causing the sidelobes to drop rapidly.

The cost is that the mainlobe widens from 2 bins to 4 bins — the minimum distance between two resolvable frequencies doubles.

More generally, the higher the order of continuous derivatives at the window edges, the faster the sidelobe decay rate:

$$\text{At the edges, } w^{(k)}(0) = w^{(k)}(N-1) = 0 \text{ for } k = 0, 1, \ldots, m \implies \text{sidelobes} \sim O(\omega^{-(m+2)})$$

$\blacksquare$

Periodogram: Definition and Statistical Properties

The simplest spectral estimate: take the squared magnitude of the windowed DFT.

$$\hat{S}(\omega) = \frac{1}{NU}\left|\sum_{n=0}^{N-1}w[n]\,x[n]\,e^{-j\omega n}\right|^2, \quad U = \frac{1}{N}\sum_{n=0}^{N-1}|w[n]|^2$$

Expand: Proof that the periodogram is an inconsistent estimator

Expected value of the periodogram:

$$E[\hat{S}(\omega)] = \frac{1}{2\pi}S(\omega) * |W(\omega)|^2$$

This is the convolution of the true PSD $S(\omega)$ with the window power spectrum — biased, but as $N \to \infty$, $|W|^2$ approaches a delta function and the bias vanishes.

However, the variance is where the problem lies. It can be shown (using Bartlett's formula) that:

$$\text{Var}[\hat{S}(\omega)] \approx S^2(\omega) \quad (\text{does not decrease with } N\text{!})$$

That is, the relative standard deviation is $\approx 100\%$, regardless of how large $N$ is. Increasing $N$ only lets you see equally violent random fluctuations on a finer frequency grid.

This is why the periodogram is an "inconsistent estimator" — Welch's method or the Multitaper method is needed to reduce the variance. $\;\blacksquare$

Window Function Formulas

Rectangular: $w[n] = 1, \quad 0 \leq n \leq N-1$

Hann：$w[n] = 0.5\!\left(1 - \cos\frac{2\pi n}{N-1}\right)$

Hamming：$w[n] = 0.54 - 0.46\cos\frac{2\pi n}{N-1}$

Blackman：$w[n] = 0.42 - 0.5\cos\frac{2\pi n}{N-1} + 0.08\cos\frac{4\pi n}{N-1}$

Blackman-Harris (4-term)：$w[n] = 0.35875 - 0.48829\cos\frac{2\pi n}{N-1} + 0.14128\cos\frac{4\pi n}{N-1} - 0.01168\cos\frac{6\pi n}{N-1}$

Flat-top: Coefficients designed to make the mainlobe top flat, sacrificing frequency resolution for amplitude accuracy ($< 0.01$ dB error)

Kaiser: $w[n] = \frac{I_0\!\left(\beta\sqrt{1 - \left(\frac{2n}{N-1} - 1\right)^2}\right)}{I_0(\beta)}$, where $I_0$ is the modified Bessel function

Comparison of Five Windows

Window	Mainlobe Width (bins)	Highest Sidelobe (dB)	Sidelobe Decay Rate	ENBW (bins)	Typical Use
Rectangular	2	-13	-6 dB/oct	1.00	Transient analysis, resolution priority
Hann	4	-31	-18 dB/oct	1.50	General-purpose default, audio analysis
Hamming	4	-42	-6 dB/oct	1.36	Speech analysis, FIR design
Blackman	6	-58	-18 dB/oct	1.73	High dynamic range, radar sidelobe suppression
Kaiser ($\beta$=6)	~5	-46	Adjustable	~1.5	Adjustable trade-off, filter design
Flat-top	~10	-44	-6 dB/oct	3.77	Calibration, precise amplitude measurement

ENBW (Equivalent Noise Bandwidth): Indicates how much noise power the window lets through. ENBW = 1.0 is ideal (only the rectangular window achieves this); other windows with ENBW > 1 mean that noise power is amplified. When measuring PSD, ENBW correction is required.

How to Use: Window Selection Decision Tree

What is your requirement?

Need to resolve two close frequencies? → Rectangular or Kaiser (small $\beta$, e.g. 2~4)
Need to see a weak signal next to a strong one (high dynamic range)? → Blackman or Kaiser (large $\beta$, e.g. 10~14) or Blackman-Harris
Need precise amplitude measurement of frequency components? → Flat-top
Unsure / general purpose? → Hann (almost always a safe choice)
Want continuously adjustable trade-off? → Kaiser (adjust $\beta$ from 0 to 20 for continuous coverage of all trade-offs)

Kaiser $\beta$ rules of thumb: $\beta = 0$ → rectangular window; $\beta \approx 5$ → approximately Hamming; $\beta \approx 6$ → approximately Hann; $\beta \approx 8.5$ → approximately Blackman; $\beta > 10$ → exceeds Blackman's dynamic range.

Python Example: Comparing Five Window Functions

# Python: compare 5 window functions import numpy as np import matplotlib.pyplot as plt from scipy.signal import windows N = 64 windows_dict = { 'Rectangular': np.ones(N), 'Hann': windows.hann(N), 'Hamming': windows.hamming(N), 'Blackman': windows.blackman(N), 'Kaiser beta=8': windows.kaiser(N, 8) } fig, axes = plt.subplots(2, 1, figsize=(10, 8)) # Time domain for name, w in windows_dict.items(): axes[0].plot(w, label=name, linewidth=1.5) axes[0].legend(); axes[0].set_title('Time domain: window shapes') axes[0].set_xlabel('n'); axes[0].grid() # Frequency domain (zero-pad to 8N for denser spectrum) Nfft = 8 * N freqs = np.linspace(-0.5, 0.5, Nfft) for name, w in windows_dict.items(): W = np.fft.fftshift(np.fft.fft(w, Nfft)) W_db = 20 * np.log10(np.abs(W) / np.max(np.abs(W)) + 1e-12) axes[1].plot(freqs, W_db, label=name, linewidth=1) axes[1].set_ylim(-100, 5); axes[1].legend() axes[1].set_title('Frequency domain: magnitude response (dB)') axes[1].set_xlabel('Normalized frequency'); axes[1].grid() plt.tight_layout(); plt.show()

Application Scenarios

THD (Total Harmonic Distortion) analysis: Measuring harmonic distortion of an audio amplifier with a 1 kHz fundamental, needing to see 2nd and 3rd harmonics 90 dB below the fundamental. Use Hann or Flat-top window (Flat-top amplitude error < 0.01 dB; prefer Flat-top if frequency resolution is sufficient). Industry standard: AES17 specifies a Flat-top window for THD+N measurements.
Audio spectrum analysis (DAW / mixing console): Real-time display of a music signal's spectrum. Typically uses the Hann window, NFFT = 4096~8192 (at 44.1 kHz sampling rate, frequency resolution is about 5~10 Hz). The Hann window's sidelobes are sufficiently low (-31 dB), causing no serious artifacts in the display.
Radar sidelobe suppression: In radar echoes, sidelobes of strong targets (e.g., large ships) can mask nearby weak targets (e.g., small speedboats). Use the Blackman-Harris 4-term window (sidelobes -92 dB) or the Dolph-Chebyshev window (equiripple design). The cost is a wider mainlobe (range resolution drops by about 50%), but this is worthwhile in scenarios requiring very high dynamic range.

Pitfalls and Limitations

Forgetting to window = rectangular window = -13 dB sidelobes: This is the most common mistake. Many beginners use np.fft.fft(x) directly without windowing, then wonder why the spectrum looks so "dirty." Unless the signal is captured at exactly an integer number of periods, there will always be leakage.
Flat-top window: accurate amplitude but poor frequency resolution: The mainlobe width is about 10 bins, 2.5 times wider than Hann. Two close frequencies will blur together. Only suitable for scenarios where frequency components are known and well-separated (e.g., calibration measurements).
Window functions reduce the effective data length: Data near the edges is down-weighted by the window function. Effective data length ≈ ENBW × $N / f_s$ seconds. Overlap processing can partially compensate for this loss.
Coherent gain correction: After windowing, you must divide by the window's mean value (coherent gain = $\frac{1}{N}\sum w[n]$) to correctly read the amplitude of a single frequency component. Forgetting this correction causes amplitude readings to be too low.

When Not to Use the Periodogram?

Need a stable PSD estimate: The periodogram's variance does not decrease with more data → use Welch's method (Section 3.2) or Multitaper
Data is extremely short (tens of points) and high resolution is needed: FFT resolution = $f_s/N$ is insufficient → use AR parametric model (Section 3.3)
Need to resolve extremely close sinusoids (super-resolution): Even windowing cannot break the FFT resolution limit → use MUSIC / ESPRIT (Section 3.4)
Only care about details in a narrow frequency band: A full-range FFT wastes computational resources → use Chirp-Z Transform (Section 3.5)

Interactive: Window Spectrum Comparison

Select a window function and observe its time-domain shape and frequency response (dB scale). Note the trade-off between mainlobe width and sidelobe height.

$\beta$=6

Adjust parameters above to see the plot

Interactive: Leakage and Resolution

Two sinusoids with close frequencies — switch windows to compare FFT results in real time. When the two frequencies are too close, some windows' mainlobes are too wide and merge the two peaks into one.

f1=100 Hz

f2=108 Hz

Adjust parameters above to see the plot

References: [1] Schuster, On the Investigation of Hidden Periodicities with Application to a Supposed 26 Day Period of Meteorological Phenomena, Terr. Magn., 1898. [2] Harris, On the Use of Windows for Harmonic Analysis with the DFT, Proc. IEEE, 1978. [3] Kaiser & Schafer, On the Use of the I₀-Sinh Window for Spectrum Analysis, IEEE Trans. ASSP, 1980. [4] Oppenheim & Schafer, Discrete-Time Signal Processing, Ch.10.

📝 Worked Example

You need to analyze two sinusoids: 100 Hz (0 dB) and 108 Hz (-35 dB). Sampling rate 1000 Hz, observation duration 0.256 s (256 points). Can you resolve them without windowing? With a Hann window? With a Blackman window?

Show solution

Δf = 1000/256 = 3.91 Hz. Frequency spacing 8 Hz > Δf, so theoretically resolvable.

But the weak signal is only -35 dB:

(1) Rectangular window sidelobes -13 dB → leakage from 100 Hz at 108 Hz is about -13 dB, much stronger than the -35 dB true signal → cannot see it

(2) Hann window sidelobes -31 dB → leakage -31 dB, still stronger than -35 dB → barely visible

(3) Blackman window sidelobes -58 dB → leakage -58 dB, well below -35 dB → clearly visible

✅ Quick Check

Q1: You need to detect a weak signal next to a strong one (40 dB difference). Which window should you use?

Show answer

Blackman window (sidelobes -58 dB) or Kaiser (with large β). The Hann window's sidelobes are only -31 dB, which is not enough.

Q2: What does computing an FFT without windowing amount to?

Show answer

It amounts to using a rectangular window, with sidelobes only -13 dB — almost guaranteed to cause severe spectral leakage.

← 2.7 Transform Relationship Overview 3.2 Welch & Multitaper →

3.2 Welch's Method & Multitaper Spectral Estimation

Two major approaches to reducing PSD estimation variance

Why does this matter? Because the spectrum computed from a single FFT differs every time (high variance), and cannot be directly used to set alarm thresholds or make statistical comparisons. Welch's method makes the results stable enough for engineering decisions.

Previously: Section 3.1's windows solved the leakage problem, but the periodogram has another fatal flaw: the variance is too large (results differ every time). How do we stabilize the spectral estimate?

Learning Objectives

Understand Welch's method: bias-variance trade-off of segment-window-average
Master the parameter selection process for segment length, overlap ratio, and window
Learn about DPSS (Slepian) sequences and the Multitaper method
Compare applicable scenarios for Welch and Multitaper

One-Sentence Summary

The core idea of Welch's method is extremely simple: divide a long data record into multiple segments, compute the spectrum of each segment, then average — stabilizing the result.

Pain Point: Unstable Spectrum

You compute the PSD of a vibration signal using the periodogram, and the result looks different every time — wild fluctuations that look more like noise than a clean spectrum. You cannot use such unstable results to set machine monitoring alarm thresholds, nor can you reliably compare two measurements.

This is because the periodogram is an inconsistent estimator — no matter how long the data you collect, the relative fluctuation of the estimate remains around 100%. You need a method to "smooth out" these fluctuations.

Origin

M.S. Bartlett (1948) first proposed the idea of segment averaging: divide the data into $K$ non-overlapping segments, compute the periodogram of each, then average. This simply reduces the variance to $\approx 1/K$.

Peter Welch (1967) at IBM made two key improvements to Bartlett's method: (1) allowing segments to overlap, squeezing more segments from the same data length; (2) applying a window function to each segment (instead of a rectangular window), reducing leakage. Because of its simplicity and effectiveness, this method became the de facto standard for computing PSD in engineering. Python's scipy.signal.welch() and MATLAB's pwelch() are both based on it.

David Thomson (1982) at Bell Labs proposed a completely different approach — the Multitaper method: instead of segmenting, use multiple orthogonal windows (DPSS / Slepian sequences) to perform multiple windowed FFTs on the same data, then average. This avoids sacrificing frequency resolution (because no segmenting), and is the gold standard for PSD estimation of short data.

Principle: Welch's Method

Intuition: A single photo may have noise, but averaging $K$ photos produces a clean image. Welch's method does the same for spectra — divide the data into segments, compute a "spectral photo" for each, then average.

Welch PSD Estimate

$$\hat{S}_W(f) = \frac{1}{K}\sum_{i=0}^{K-1}\frac{1}{LU}\left|\sum_{n=0}^{L-1}w[n]\,x[n+iD]\,e^{-j2\pi fn/f_s}\right|^2$$

$L$: segment length, $D$: hop size ($D = L - \text{overlap}$), $K$: number of segments, $U = \frac{1}{L}\sum|w[n]|^2$ (window power normalization)

Core Trade-off (Bias-Variance Tradeoff):

More segments $K$ → lower variance ($\approx 1/K$) → more stable spectrum
But for fixed total data length $N$, more $K$ → shorter segments ($L$ smaller) → worse frequency resolution $\Delta f = f_s/L$
This is unavoidable: frequency resolution × stability = constant (determined by total data length $N$)

Expand: Derivation of equivalent degrees of freedom for Welch's method

Each segment's periodogram approximately follows a $\chi^2_2$ distribution (2 degrees of freedom). After averaging $K$ segments:

$$\hat{S}_W(f) \sim \frac{S(f)}{K_{\text{eff}}}\,\chi^2_{2K_{\text{eff}}}$$

where $K_{\text{eff}}$ is the effective number of independent segments. If segments do not overlap (Bartlett), $K_{\text{eff}} = K$. With 50% overlap + Hann window:

$$K_{\text{eff}} \approx \frac{K}{1 + 2\sum_{k=1}^{K-1}(1-k/K)\rho_k^2}$$

where $\rho_k$ is the correlation coefficient between the $k$-th pair of adjacent windowed segments. For Hann window with 50% overlap, $\rho_1 \approx 0.167$; more distant segments are nearly uncorrelated.

Empirical conclusion: Hann + 50% overlap yields about 1.6 times the equivalent degrees of freedom of the non-overlapping version — squeezing 60% more statistical independence from the same data.

Normalized standard deviation (relative error): $\epsilon = 1/\sqrt{K_{\text{eff}}}$. Engineering rule of thumb: $\epsilon < 0.1$ (i.e., $K_{\text{eff}} > 100$) to be considered "stable." $\;\blacksquare$

How to Use: Four-Step Parameter Selection

Step 1: Determine frequency resolution $\Delta f$

Segment length $L = f_s / \Delta f$. Example: $f_s = 10\,\text{kHz}$, need $\Delta f = 1\,\text{Hz}$ → $L = 10000$ points.

Step 2: Choose overlap ratio

Hann window: 50% overlap ($D = L/2$). Hamming window: 67% overlap ($D = L/3$). Rule of thumb: overlap ratio = $1 - 1/\alpha$, where $\alpha$ is the ENBW.

Step 3: Choose window

Usually Hann. For higher dynamic range, use Blackman.

Step 4: Compute segment count and stability

$K = \lfloor(N - L)/(L - \text{overlap}) + 1\rfloor$. Equivalent degrees of freedom $\nu \approx 2K \times (\text{overlap correction})$. Normalized error $\epsilon \approx 1/\sqrt{K_{\text{eff}}}$.

Concrete example: Vibration monitoring

Accelerometer sampling rate $f_s = 10\,\text{kHz}$
Want frequency resolution $\Delta f = 1\,\text{Hz}$ → $L = 10000$ points (1 second)
Hann window, 50% overlap → $D = 5000$
Collect 10 seconds of data ($N = 100000$)
Number of segments $K = (100000 - 10000)/5000 + 1 = 19$ segments
Equivalent DOF $\nu \approx 2 \times 19 \times 0.85 \approx 32$ (normalized error $\approx 18\%$)
If more stability is needed ($\epsilon < 10\%$), collect 30 seconds of data → $K \approx 59$, $\epsilon \approx 10\%$

Multitaper Method (Thomson, 1982)

Intuition: Welch's method shortens the data to gain more segments for averaging, sacrificing resolution. Can we use all the data to preserve resolution while still averaging multiple estimates? Yes — use different windows on the same data for FFT. But these windows must be orthogonal; otherwise the results are not independent and averaging has no effect.

DPSS (Discrete Prolate Spheroidal Sequences / Slepian sequences) are exactly such an orthogonal window family: they are the sequences with the highest energy concentration within a given bandwidth $NW/N$. The first $K \approx 2NW$ DPSS are nearly perfectly orthogonal, with almost all their energy concentrated in the target bandwidth.

$$\hat{S}_{MT}(f) = \frac{1}{K}\sum_{k=0}^{K-1}\left|\sum_{n=0}^{N-1}v_n^{(k)}\,x[n]\,e^{-j2\pi fn/f_s}\right|^2$$

$\{v^{(k)}\}_{k=0}^{K-1}$: first $K$ DPSS, $NW$: half-bandwidth parameter (commonly $NW = 3$ or 4)

Property	Welch	Multitaper
Variance reduction method	Segment averaging	Multi-window averaging (no segmenting)
Frequency resolution	$f_s/L$ ($L$ = segment length < $N$)	$2NW \cdot f_s/N$ ($\approx f_s/N$ level)
Equivalent DOF	$\approx 2K$	$\approx 2K$ ($K \approx 2NW$)
Best scenario	Long data	Short data (preserves full resolution)
Computational cost	$K$ FFTs ($L$-point)	$K$ FFTs ($N$-point) + DPSS computation
Implementation	`scipy.signal.welch()`	`spectrum.pmtm()` / `nitime`

Gold Standard: Multitaper requires no segmenting (preserving full frequency resolution) yet still reduces variance. For PSD estimation of short data (hundreds to thousands of points), it is the recognized optimal method. The cost: DPSS must be precomputed (but only once), and $K$ is limited to $\approx 2NW$ (typically 5~8), unlike Welch which can have tens or even hundreds of segments.

Application Scenarios

Vibration PSD trend monitoring (Welch): A wind turbine gearbox takes a 60-second acceleration data segment every hour ($f_s = 25.6\,\text{kHz}$), and uses Welch's method ($L = 25600$, 50% overlap, Hann) to compute a stable PSD. Daily comparison of PSD energy in specific frequency bands reveals upward trends → early fault warning.
Communication system noise floor measurement (Welch): Measuring an RF receiver's noise power spectral density $N_0$. Requires a very stable PSD estimate ($\epsilon < 5\%$), typically using long acquisitions + Welch's method with $K > 400$ segments.
Neuroscience LFP/EEG power spectra (Multitaper): EEG signal trials are typically only 1-2 seconds long. Multitaper ($NW = 4$, $K = 7$ tapers) provides a stable PSD estimate while preserving $\sim 2\,\text{Hz}$ resolution. This is the standard practice in the neuroscience community.

Pitfalls and Limitations

Segments too short → frequency blurring: If $\Delta f = f_s/L = 100\,\text{Hz}$, but the two frequencies you want to see differ by only 50 Hz, they will blur together. Always verify that $L$ corresponds to a $\Delta f$ that meets your requirements.
Too few segments → still unstable: $K = 3$ segments give only 6 degrees of freedom; the PSD estimate remains very noisy. Practical minimum: $K \geq 8$ (16 DOF) before it becomes useful.
Too much overlap → segments not independent: 90% overlap appears to give many segments, but adjacent segments share nearly identical data, greatly reducing the averaging benefit. Hann + 50% is the optimal balance point.
Multitaper $NW$ selection: $NW$ too large → poor resolution (equivalent bandwidth $= 2NW \cdot f_s/N$); $NW$ too small → poor DPSS quality (high-order taper energy leakage). Typically $NW = 3$ or 4.

When Not to Use?

Signal is non-stationary: Welch assumes stationary statistics within each segment. If the signal's frequency changes over time (e.g., chirp), Welch averages spectra from different time instants together → use STFT / spectrogram (Section 5.1) instead
Only need to detect discrete sinusoidal frequencies (no continuous PSD needed): → use MUSIC / ESPRIT (Section 3.4) for greater accuracy
Data is extremely short and the signal model is known (e.g., speech): → use AR parametric model (Section 3.3) instead

References: [1] Bartlett, Smoothing Periodograms from Time-Series with Continuous Spectra, Nature, 1948. [2] Welch, The Use of FFT for the Estimation of Power Spectra, IEEE Trans. Audio Electroacoustics, 1967. [3] Thomson, Spectrum Estimation and Harmonic Analysis, Proc. IEEE, 1982. [4] Percival & Walden, Spectral Analysis for Physical Applications, Cambridge, 1993.

📝 Worked Example

Vibration monitoring: f_s=10 kHz, want Δf=2 Hz PSD. Using Hann window with 50% overlap. (a) Segment length? (b) How many segments in 10 seconds of data? (c) Equivalent degrees of freedom?

Show solution

(a) L = f_s/Δf = 10000/2 = 5000 points

(b) hop = 5000×0.5 = 2500, K = floor((100000−5000)/2500)+1 = 39 segments

(c) Equivalent DOF for Hann 50% overlap ≈ 2×39×(8/3) ≈ not fully independent, effective segments ≈ 39×0.67 ≈ 26, DOF ≈ 52

✅ Quick Check

Q1: Welch's method with segment length L=1000, sampling rate fs=10 kHz — what is the frequency resolution?

Show answer

Δf = fs/L = 10000/1000 = 10 Hz。

Q2: If the number of segments doubles (K→2K), by roughly what factor does the PSD estimation variance change?

Show answer

Approximately halved (~1/K), provided the segments are approximately independent.

Interactive: Welch's Method vs Single FFT

Same noisy signal (containing two sinusoids), comparing the periodogram's "jagged" look vs Welch's "smooth" result. Each click regenerates the noise.

Welch segment length:

Adjust parameters above to see the plot

← 3.1 Periodogram & Windows 3.3 Parametric Methods (AR) →

3.3 Parametric Spectral Estimation (AR Model)

Model fitting instead of direct FFT — high-resolution spectra from short data

Why does this matter? Because when data is short (only a few dozen samples), the FFT's frequency resolution is too poor. The AR model can squeeze higher resolution from short data than the FFT — this is especially important in speech analysis (20 ms per frame) and biomedical signals.

Previously: Section 3.2's Welch method reduces variance by segment averaging, but sacrifices frequency resolution. If the data is short, is there a way to avoid sacrificing resolution?

Learning Objectives

Establish the equivalence between the AR(p) model and linear prediction
Derive the Yule-Walker equations and Levinson-Durbin recursion
Understand the lattice structure of the Burg algorithm
Master AIC/BIC criteria for AR order selection
Compare the resolution of AR spectra versus the periodogram

One-Sentence Summary

Rather than computing the FFT directly, assume the signal is generated by a model (AR model), and use the model to infer the spectrum — even short data can yield a smooth, high-resolution result.

Pain Point: The Resolution Bottleneck of Short Data

Your data has only a few dozen samples (e.g., a 20 ms speech frame at 8 kHz sampling rate has only 160 points), giving an FFT frequency resolution of $\Delta f = f_s/N = 8000/160 = 50\,\text{Hz}$. If two speech formants are at 500 Hz and 530 Hz, only 30 Hz apart — far less than the 50 Hz resolution — the FFT will show a single blurry wide peak, completely unable to separate them.

You cannot collect longer data (speech is non-stationary; beyond 20-30 ms the statistical properties change), and zero-padding only interpolates without truly increasing resolution. You need a method that can "squeeze" more spectral information from short data.

Origin

The story of the AR model traces back to G. Udny Yule (1927) and Gilbert Walker (1931), who developed the estimation theory for autoregressive models (Yule-Walker equations) to analyze the quasi-periodicity of sunspots and the periodic patterns of Indian monsoons.

Norman Levinson (1947) and later James Durbin (1960) developed an efficient recursive solution (Levinson-Durbin recursion), reducing computational complexity from $O(p^3)$ (direct solution of linear equations) to $O(p^2)$.

The person who truly brought the AR model into spectral estimation was John Parker Burg (1967), who proposed the Maximum Entropy Method (MEM) in his Ph.D. dissertation at Stanford. Burg's insight was: given limited autocorrelation data, among all consistent PSDs, the one with "maximum information entropy" is exactly the PSD corresponding to the AR model. This gave AR spectral estimation an information-theoretic foundation. Burg's advisor was Robert White, whose research was motivated by short-data spectral analysis in geophysics — seismic exploration data is often short, and FFT resolution is insufficient.

In speech processing, the AR model is widely known by the name LPC (Linear Predictive Coding). Itakura (1968) and Atal & Hanauer (1971) applied the AR model to speech analysis and coding, ushering in the era of digital speech communication. The core of early GSM mobile voice coding (RPE-LTP) and CELP coding is the AR model.

Principle: AR(p) Model

Intuition: The AR model assumes that each sample of the signal can be predicted by a linear combination of the past $p$ samples plus white noise. If the prediction is good, the residual is white noise. Viewed in reverse, the linear predictor is an "all-pole filter," with white noise passing through it to produce the observed signal. Each pair of conjugate poles produces a peak in the spectrum.

AR(p) Difference Equation

$$x[n] = -\sum_{k=1}^{p}a_k\,x[n-k] + e[n], \quad e[n] \sim \text{WN}(0, \sigma^2)$$

Equivalent representation: $A(z)\,X(z) = E(z)$, where $A(z) = 1 + a_1 z^{-1} + \cdots + a_p z^{-p}$

PSD of the AR model:

$$S_{AR}(f) = \frac{\sigma^2}{\left|A(e^{j2\pi f/f_s})\right|^2} = \frac{\sigma^2}{\left|1 + \sum_{k=1}^{p}a_k\,e^{-j2\pi fk/f_s}\right|^2}$$

Since the denominator is the squared magnitude of a polynomial, $S_{AR}(f)$ has only peaks (corresponding to poles near the unit circle), and no zeros (valleys). Each pair of conjugate poles $r\,e^{\pm j\theta}$ produces a peak at $f = \theta f_s/(2\pi)$; the closer the pole radius $r$ is to 1, the sharper the peak.

Yule-Walker Equations

Expand derivation

Multiply both sides of the AR equation by $x^*[n-m]$ and take the expectation:

$$E[x[n]\,x^*[n-m]] = -\sum_{k=1}^{p}a_k\,E[x[n-k]\,x^*[n-m]] + E[e[n]\,x^*[n-m]]$$

Left side = $r_{xx}[m]$ (autocorrelation function). First term on the right = $-\sum a_k\,r_{xx}[m-k]$.

Second term on the right: because $e[n]$ is only correlated with $x[n], x[n-1], \ldots$ (not with future values):

$m = 0$: $E[e[n]\,x^*[n]] = \sigma^2$ ($e[n]$ is part of $x[n]$)
$m \geq 1$: $E[e[n]\,x^*[n-m]] = 0$ ($e[n]$ is uncorrelated with past $x$)

For $m = 1, 2, \ldots, p$, we obtain the Yule-Walker linear system:

$$\underbrace{\begin{bmatrix}r[0]&r[-1]&\cdots&r[1-p]\\r[1]&r[0]&\cdots&r[2-p]\\\vdots&&\ddots&\vdots\\r[p-1]&\cdots&&r[0]\end{bmatrix}}_{\mathbf{R}\;(\text{Toeplitz})}\begin{bmatrix}a_1\\a_2\\\vdots\\a_p\end{bmatrix} = -\begin{bmatrix}r[1]\\r[2]\\\vdots\\r[p]\end{bmatrix}$$

$m = 0$: $\sigma^2 = r[0] + \sum_{k=1}^{p}a_k\,r[-k]$ (white noise power).

$\mathbf{R}$ is a Toeplitz positive-definite matrix (since it is an autocorrelation matrix), solvable by the Levinson-Durbin recursion in $O(p^2)$ (vs. $O(p^3)$ for general linear systems). $\;\blacksquare$

Levinson-Durbin Recursion

Expand algorithm steps

Progressively build from AR(1) up to AR(p):

Initialization ($m=0$): $\sigma_0^2 = r[0]$

Recursion ($m = 1, 2, \ldots, p$):

$$k_m = -\frac{r[m] + \sum_{i=1}^{m-1}a_i^{(m-1)}\,r[m-i]}{\sigma_{m-1}^2} \quad \text{(reflection coefficient)}$$ $$a_m^{(m)} = k_m$$ $$a_i^{(m)} = a_i^{(m-1)} + k_m\,a_{m-i}^{(m-1)}, \quad i = 1, \ldots, m-1$$ $$\sigma_m^2 = (1 - |k_m|^2)\,\sigma_{m-1}^2$$

Stability guarantee: If $|k_m| < 1$ holds for all $m$ (guaranteed by the autocorrelation method), then the AR model is stable (all poles inside the unit circle).

At each step, $\sigma_m^2$ is the prediction error power of AR($m$) — decreasing as $m$ increases. When adding one more order causes $\sigma_m^2$ to barely decrease, that is an indicator of the optimal order. $\;\blacksquare$

Order Selection: Information-Theoretic Foundation of AIC and BIC

Expand derivation: Information-theoretic foundation of AIC and BIC

AIC and BIC are not arbitrary formulas — they come from the statistical principle of "balancing goodness of fit against model complexity."

AIC (Akaike Information Criterion, 1974):

$$\text{AIC}(p) = -2\ln L(\hat{\theta}_p) + 2p$$

where $L$ is the likelihood and $p$ is the number of parameters.

Origin: Akaike proved that this formula is an unbiased estimator of the KL divergence (Kullback-Leibler divergence):

$$\text{AIC} \approx 2N \cdot D_{KL}(\text{true distribution} \| \text{model})$$

The first term $-2\ln L$ measures "how well the model fits the data" (smaller is better).

The second term $2p$ is the "complexity penalty" — each additional parameter costs 2 points, preventing overfitting.

For an AR(p) model with Gaussian residuals, it simplifies to $\text{AIC}(p) = N\ln\hat{\sigma}_p^2 + 2p$.

BIC (Bayesian Information Criterion, 1978):

$$\text{BIC}(p) = -2\ln L(\hat{\theta}_p) + p\ln N$$

The second term is $p\ln N$ rather than $2p$.

Origin: BIC comes from Bayes' theorem — it is an approximation of the negative log of the posterior probability $P(\text{model}|\text{data})$.

As $N \to \infty$, $\ln N$ grows larger than $2$, so BIC imposes a stronger complexity penalty and tends to choose simpler models.

Which one to choose?

Criterion	Characteristics	Suitable for
AIC	Tends to pick more complex models	Prediction-focused tasks (capture detail)
BIC	Tends to pick simpler models; asymptotically consistent	Explanation-focused tasks (find the true model order)

How to Use: Four-Step Process

Step 1: Choose AR order $p$

Rule of thumb: $p \approx 2 \times$ (expected number of spectral peaks). For example, speech has 4~5 formants → $p \approx 10$, plus glottal pulse and radiation effects → typically $p = 10$~$14$ (at 8 kHz sampling rate).

Information criteria: AIC = $N\ln\sigma_p^2 + 2p$; BIC = $N\ln\sigma_p^2 + p\ln N$. Choose $p$ that minimizes AIC/BIC. BIC penalizes high orders more heavily, favoring lower $p$.

Step 2: Compute autocorrelation $r[0], r[1], \ldots, r[p]$

$r[k] = \frac{1}{N}\sum_{n=0}^{N-1-k}x[n+k]\,x^*[n]$ (biased estimator, but guarantees the Toeplitz matrix is positive-definite).

Step 3: Solve Yule-Walker (Levinson-Durbin) for $\{a_k\}$ and $\sigma^2$

Or use the Burg algorithm (no need to compute autocorrelation first; estimates reflection coefficients directly from data with better statistical properties).

Step 4: Compute PSD

Evaluate $S_{AR}(f) = \sigma^2/|A(e^{j2\pi f/f_s})|^2$ on a dense frequency grid.

Concrete example: Speech formant analysis

Sampling rate $f_s = 8\,\text{kHz}$, speech frame length 20 ms → $N = 160$ points
FFT resolution = $8000/160 = 50\,\text{Hz}$ (too coarse!)
Choose AR(12): expect 4 formants + 2 extra poles (glottal + radiation) = 12
Levinson-Durbin solution → yields 12 AR coefficients
Compute PSD → smooth spectral envelope clearly showing $F_1 \approx 500\,\text{Hz}$, $F_2 \approx 1500\,\text{Hz}$, $F_3 \approx 2500\,\text{Hz}$ formants
This is the core of LPC (Linear Predictive Coding)!

Application Scenarios

Speech analysis and coding (LPC): LPC = AR model. The core of GSM mobile voice coding (RPE-LTP, 13 kbps) and CELP coding (e.g., AMR, G.729) is the AR(10)~AR(16) model. Vocoders use the AR model to separate glottal excitation from vocal tract resonance, forming the basis of speech synthesis and voice modification technology.
High-resolution frequency estimation from short data: Seismic exploration reflection analysis with data windows of only 50~100 samples. The AR model can resolve multi-layer reflection frequency differences where FFT resolution is insufficient. Real example: 100 points @ 1 kHz data, FFT $\Delta f = 10\,\text{Hz}$, AR(20) successfully resolves two peaks separated by 5 Hz.
Heart rate variability (HRV) frequency-domain analysis: ECG R-R interval series typically have only 300~500 data points (5-minute short-term HRV). AR(16)~AR(20) models can clearly resolve LF (0.04-0.15 Hz) and HF (0.15-0.4 Hz) components, much smoother and more stable than FFT. In clinical diagnosis, the LF/HF ratio is an important indicator of autonomic nervous system function.

Pitfalls and Limitations

AR has only peaks, no valleys: The all-pole model can inherently only describe spectra with peaks. If the true spectrum has deep valleys (zeros), the AR model requires a very high order to approximate them, and the result is poor. In such cases, consider ARMA models (which have both poles and zeros).
Order too low → missing peaks: AR(4) can have at most 2 spectral peaks. If there are actually 3 peaks, the third will be completely missed.
Order too high → spurious peaks: AR(30) on 160 data points will overfit, producing nonexistent false peaks. Always use AIC/BIC as a safeguard.
Worse than periodogram for broadband noise: The AR model assumes the spectrum is smooth (determined by a few poles). For a broadband flat noise floor, it actually performs worse than the direct periodogram.
Non-stationary signals: AR assumes stationarity. For non-stationary signals, you need to apply AR in short windows, or use time-varying AR (e.g., Kalman filtering + AR).

When Not to Use?

Data is long and only a general PSD is needed: Use Welch's method (Section 3.2) directly — simpler, more robust, no order selection needed
Spectrum has prominent zeros (deep valleys): Consider ARMA models (but estimation is more complex and convergence is worse)
Need super-resolution to resolve sinusoidal frequencies: AR model resolution is still limited by the signal's SNR → use MUSIC / ESPRIT (Section 3.4) instead
Signal model is unclear: AR's advantage depends on "correct assumptions." If uncertain whether the signal suits an AR description, non-parametric methods (Welch / Multitaper) are safer

Interactive: AR Spectrum vs FFT

Signal contains two sinusoids (100 Hz + 105 Hz) + noise, with only 64 sample points. Compare the frequency resolution of the periodogram vs. the AR model. Adjust the AR order to observe the effect: too low misses peaks, too high produces spurious peaks.

AR order p=16

Adjust parameters above to see the plot

References: [1] Burg, Maximum Entropy Spectral Analysis, Ph.D. dissertation, Stanford, 1975. [2] Kay & Marple, Spectrum Analysis — A Modern Perspective, Proc. IEEE, 1981. [3] Kay, Modern Spectral Estimation: Theory and Application, Prentice Hall, 1988. [4] Stoica & Moses, Spectral Analysis of Signals, Pearson, 2005. [5] Makhoul, Linear Prediction: A Tutorial Review, Proc. IEEE, 1975.

✅ Quick Check

Q1: What happens if the AR model order is too low? Too high?

Show answer

Too low: misses real spectral peaks. Too high: produces spurious peaks (overfitting). AIC/BIC is typically used for selection.

Q2: Why can the AR model not represent spectral "valleys"?

Show answer

Because AR is an all-pole model. Poles can only produce peaks; without zeros, valleys cannot be created. An ARMA model is needed.

← 3.2 Welch / Multitaper 3.4 MUSIC / ESPRIT →

3.4 MUSIC & ESPRIT — Subspace Frequency Estimation

Frequency estimation methods that surpass the Fourier resolution limit

Why does this matter? Because some scenarios (radar target resolution, array antenna localization) require resolving frequencies or angles spaced closer than a single FFT bin. Subspace methods are currently the only mainstream technique capable of super-resolution estimation.

Previously: Section 3.3's AR model has better resolution than FFT, but is still limited by model assumptions. Is there a completely different approach that can surpass the Fourier frequency resolution limit?

Learning Objectives

Understand the decomposition into signal subspace and noise subspace
Derive the principle of the MUSIC pseudospectrum
Learn about ESPRIT's rotational invariance property
Master signal number estimation (MDL/AIC) and limitation conditions of subspace methods

One-Sentence Summary

MUSIC can resolve two frequencies spaced closer than a single FFT bin — it is not a better FFT, but a completely different approach: separating the signal space from the noise space.

Pain Point: Two Frequencies the FFT Cannot Separate

You have two radar targets too close together, with Doppler frequencies of 100 Hz and 103 Hz. Sampling rate 1 kHz, and you captured only 64 data points. One FFT frequency bin = $f_s/N = 1000/64 = 15.6\,\text{Hz}$, while the two frequencies differ by only 3 Hz — the FFT shows a single wide peak and cannot tell whether it is one target or two.

Zero-pad to 1024 points? That only interpolates the frequency axis to make bins denser, but the sinc mainlobe width does not change — the two targets are still blurred within the same mainlobe.

AR model? It helps, but at low SNR it can easily produce bias or spurious peaks. You need a fundamentally different method.

Origin

Ralph O. Schmidt (1979) at ESL Inc. (a defense electronics company) developed the MUSIC (Multiple Signal Classification) algorithm. His original motivation was Direction of Arrival (DOA) estimation in radar and electronic warfare: multiple electromagnetic waves arrive at an antenna array from different directions — how to estimate each wave's arrival angle with high precision? Traditional beamforming methods are limited by the array aperture. Schmidt's breakthrough was exploiting the subspace structure of the signal autocorrelation matrix, bypassing the traditional resolution limit.

The MUSIC paper (IEEE Trans. AP, 1986) became one of the most cited papers in signal processing (Google Scholar > 12,000 citations). Schmidt received the IEEE 2000 Society Award for this work.

Richard Roy and Thomas Kailath (1986) at Stanford proposed ESPRIT (Estimation of Signal Parameters via Rotational Invariance Techniques). ESPRIT exploits the rotational invariance between two sub-arrays, without needing to search the pseudospectrum (as MUSIC does), directly obtaining frequency estimates from matrix eigenvalues. It is computationally faster and less sensitive to precise array calibration.

These two methods are collectively known as subspace methods, and are core tools for frequency/direction estimation in modern radar, communications, sonar, and other systems.

Principle: Intuition of Subspaces

Core intuition: Imagine you are in an $M$-dimensional space. The received data = signal + noise. If there are $p$ sinusoidal signals, they "live" in a $p$-dimensional subspace (the signal subspace). Noise is spread across the entire $M$-dimensional space.

Eigendecomposition of the autocorrelation matrix separates these two spaces: $p$ large eigenvalues correspond to eigenvectors spanning the signal subspace; the remaining $M-p$ small eigenvalues ($\approx \sigma^2$, noise power) correspond to eigenvectors spanning the noise subspace.

Key fact: The signal's steering vector is necessarily orthogonal to the noise subspace. So when you take a test vector $\mathbf{a}(\omega)$ and compute its inner product with the noise subspace, at the correct frequency the inner product is zero, and taking the reciprocal gives infinity — a sharp peak.

Mathematical Derivation

Assume we observe $p$ complex sinusoids plus white noise. Construct the $M \times M$ autocorrelation matrix:

$$\mathbf{R} = E[\mathbf{x}\mathbf{x}^H] = \mathbf{A}\mathbf{S}\mathbf{A}^H + \sigma^2\mathbf{I}$$

$\mathbf{A} = [\mathbf{a}(\omega_1), \ldots, \mathbf{a}(\omega_p)]$: steering matrix, $\mathbf{S}$: signal covariance matrix

where the steering vector:

$$\mathbf{a}(\omega) = \begin{bmatrix}1 \\ e^{j\omega} \\ e^{j2\omega} \\ \vdots \\ e^{j(M-1)\omega}\end{bmatrix}$$

Expand: MUSIC pseudospectrum derivation

Eigendecompose $\mathbf{R}$:

$$\mathbf{R} = \sum_{i=1}^{M}\lambda_i\,\mathbf{e}_i\mathbf{e}_i^H = \underbrace{\sum_{i=1}^{p}(\lambda_i^s + \sigma^2)\,\mathbf{e}_i\mathbf{e}_i^H}_{\text{signal subspace}} + \underbrace{\sigma^2\sum_{i=p+1}^{M}\mathbf{e}_i\mathbf{e}_i^H}_{\text{noise subspace}}$$

where $\lambda_1 \geq \cdots \geq \lambda_p > \lambda_{p+1} = \cdots = \lambda_M = \sigma^2$.

Let $\mathbf{E}_n = [\mathbf{e}_{p+1}, \ldots, \mathbf{e}_M]$ be the eigenvector matrix of the noise subspace.

Key property: $\mathbf{a}(\omega_i) \perp \mathbf{E}_n$ for all $i = 1, \ldots, p$.

Proof: Because $\mathbf{a}(\omega_i)$ is a column of $\mathbf{A}$, it lies in the signal subspace $\text{span}(\mathbf{e}_1, \ldots, \mathbf{e}_p)$, which is orthogonal to the noise subspace.

Therefore, define the MUSIC pseudospectrum:

$$P_{\text{MUSIC}}(\omega) = \frac{1}{\mathbf{a}^H(\omega)\,\mathbf{E}_n\mathbf{E}_n^H\,\mathbf{a}(\omega)}$$

At $\omega = \omega_i$, the denominator $\mathbf{a}^H\mathbf{E}_n\mathbf{E}_n^H\mathbf{a} = \|\mathbf{E}_n^H\mathbf{a}\|^2 \to 0$, so $P_{\text{MUSIC}} \to \infty$ — producing a sharp peak.

Note: $P_{\text{MUSIC}}$ is not a true power spectral density, just an indicator function ("pseudospectrum"). Peak locations are meaningful (= frequency estimates), but peak heights have no power interpretation. $\;\blacksquare$

How to Use: Five-Step Process

Step 1: Build autocorrelation matrix $\mathbf{R}$ (size $M \times M$)

From data $x[0], \ldots, x[N-1]$, construct a Hankel matrix and estimate $\hat{\mathbf{R}} = \frac{1}{N-M+1}\sum_{n=0}^{N-M}\mathbf{x}_n\mathbf{x}_n^H$, where $\mathbf{x}_n = [x[n], x[n+1], \ldots, x[n+M-1]]^T$.

Choosing $M$: $M$ must be greater than the number of signals $p$. Rule of thumb: $M \approx N/3$ to $N/2$. $M$ too small → matrix too small, poor resolution; $M$ too large → too few snapshots for estimating $\hat{\mathbf{R}}$, inaccurate.

Step 2: Eigendecomposition

$\hat{\mathbf{R}} = \mathbf{E}\boldsymbol{\Lambda}\mathbf{E}^H$. Observe the eigenvalues: the first $p$ are significantly larger than the rest.

Step 3: Estimate the number of signals $p$

Look for the "cliff" in eigenvalues: $\lambda_1 \geq \cdots \geq \lambda_p \gg \lambda_{p+1} \approx \cdots \approx \lambda_M$. Or use MDL (Minimum Description Length) / AIC criteria for automatic determination:

$\text{MDL}(k) = -(N-M+1)(M-k)\ln\frac{\prod_{i=k+1}^{M}\lambda_i^{1/(M-k)}}{\frac{1}{M-k}\sum_{i=k+1}^{M}\lambda_i} + \frac{1}{2}k(2M-k)\ln(N-M+1)$

Choose $k$ that minimizes MDL as $\hat{p}$.

Step 4: Construct noise subspace and scan pseudospectrum

$\mathbf{E}_n = [\mathbf{e}_{p+1}, \ldots, \mathbf{e}_M]$; densely sample the frequency axis to compute $P_{\text{MUSIC}}(\omega)$.

Step 5: Find peaks → frequency estimates

Peak locations of $P_{\text{MUSIC}}$ = estimated frequencies. Parabolic interpolation can further refine the estimates.

Concrete example: Two closely spaced sinusoids

Signal: $x[n] = \sin(2\pi \cdot 100\,n/f_s) + \sin(2\pi \cdot 103\,n/f_s) + \text{noise}$
Sampling rate $f_s = 1000\,\text{Hz}$, only $N = 64$ data points
FFT bin = $1000/64 = 15.6\,\text{Hz}$ → completely unresolvable (3 Hz << 15.6 Hz)
MUSIC: choose $M = 16$, estimate $p = 2$
Eigendecomposition → 2 large eigenvalues ($\gg \sigma^2$) → 14 small eigenvalues ($\approx \sigma^2$)
Scan pseudospectrum → two clear, sharp peaks appear at 100 Hz and 103 Hz
Successfully resolved! Resolution improved by $15.6/3 \approx 5$ times

ESPRIT: A More Efficient Alternative

Intuition: MUSIC needs to scan the entire frequency axis to search for peaks, which is computationally expensive. ESPRIT exploits a clever observation — if the first $M-1$ rows and last $M-1$ rows of the matrix are viewed as two "sub-arrays," the relationship between them is a rotation (phase shift), and the rotation amount = $e^{j\omega}$ directly gives the frequency!

ESPRIT Rotational Invariance

$$\mathbf{E}_{s2} = \mathbf{E}_{s1}\,\mathbf{\Phi}, \quad \mathbf{\Phi} = \text{diag}(e^{j\omega_1}, e^{j\omega_2}, \ldots, e^{j\omega_p})$$

$\mathbf{E}_{s1}$, $\mathbf{E}_{s2}$: projections of the signal subspace onto the two sub-arrays

Implementation steps:

Eigendecompose to get signal subspace $\mathbf{E}_s$ (same as MUSIC)
Take $\mathbf{E}_{s1}$ = $\mathbf{E}_s$ with last row removed, $\mathbf{E}_{s2}$ = $\mathbf{E}_s$ with first row removed
Compute $\mathbf{\Phi} = \mathbf{E}_{s1}^{\dagger}\mathbf{E}_{s2}$ (least squares / Total Least Squares)
Eigenvalues of $\mathbf{\Phi}$: $\lambda_i = e^{j\omega_i}$ → $\omega_i = \angle\lambda_i$ → frequency $f_i = \omega_i f_s / (2\pi)$

Property	MUSIC	ESPRIT
Output	Pseudospectrum (requires peak search)	Directly gives frequency values
Computational cost	$O(M^3) + O(M^2 \cdot N_{\text{scan}})$	$O(M^3)$ (no search needed)
Array calibration	Requires precise calibration	Less sensitive
Resolution	Slightly higher (uses full noise subspace)	Slightly lower but still far exceeds FFT
Additional info	Pseudospectrum provides visualization	Only frequency values

Application Scenarios

Radar DOA estimation: Phased array radar with 8~64 antenna elements receives signals; MUSIC estimates the precise bearing angles of multiple targets. In practical systems, with a 64-element ULA (uniform linear array), MUSIC can resolve two targets separated by < 1 degree at SNR = 10 dB (conventional beamforming resolution is about 7 degrees).
Wireless communication AoA positioning: 5G base stations use massive MIMO antenna arrays, employing ESPRIT/MUSIC to estimate the Angle of Arrival (AoA) of user devices, combining multi-base-station information for indoor positioning (accuracy < 1 m). ESPRIT is preferred in real-time systems due to its computational efficiency.
Closely-spaced modal identification in vibration analysis: Mechanical structures (e.g., aircraft wings, bridges) have multiple vibration modes at natural frequencies, some extremely close (< 1 Hz apart). MUSIC can resolve these closely-spaced modes from short accelerometer data segments, used for structural health monitoring.
Multiple pitch estimation in music signals: In a piano chord, the fundamental frequencies of multiple notes differ by less than one semitone (~6%), each with harmonics. MUSIC can precisely estimate the fundamental frequency of each note in a chord, serving as a tool for Automatic Music Transcription.

Pitfalls and Limitations

Must know or estimate the number of signals $p$: This is the "Achilles' heel" of subspace methods. If $p$ is estimated incorrectly, the results are wrong. Overestimating $p$ → spurious peaks; underestimating $p$ → missed signals. The MDL criterion is reliable at high SNR, but prone to failure at low SNR.
Correlated signals (coherent sources) cause failure: If two signals are fully correlated (e.g., reflections in multipath), $\mathbf{S}$ is rank-deficient, the signal subspace dimension is less than $p$, and MUSIC "leaks" some signals into the noise subspace. Solution: spatial smoothing — sacrificing some array aperture to restore the rank of $\mathbf{S}$.
Computational cost $O(M^3)$: The eigendecomposition complexity. Already expensive at $M = 64$. Large-scale arrays ($M > 100$) require fast subspace tracking algorithms (e.g., PAST, GROUSE).
Performance degrades sharply at low SNR: Subspace methods exhibit a "threshold effect" — when SNR falls below a threshold (typically 5~10 dB), performance degrades dramatically, with large estimation biases or completely wrong results. This is because noise and signal eigenvalues begin to overlap, and subspace separation fails.
Only suitable for "few sinusoids + white noise" model: If the signal is broadband (e.g., speech) or the noise is colored (non-white), the assumptions of standard MUSIC/ESPRIT are violated and results are unreliable. Colored noise requires pre-whitening.

When Not to Use?

Need a full PSD (not just discrete frequencies): MUSIC only estimates discrete frequency components, not a continuous PSD → use Welch or AR model
Many signals and count unknown: More than $M/2$ signals cannot be handled (a fundamental limitation of subspace methods). In practice, $p > 5$~$8$ is already difficult
Real-time processing with large matrices: $O(M^3)$ eigendecomposition may be too slow for embedded systems → consider ESPRIT (faster) or subspace tracking algorithms
Need a robust "works anywhere" method: Subspace methods are sensitive to model assumptions (white noise, known signal count, uncorrelated) → Welch / Multitaper is safer

Interactive: MUSIC vs FFT

Two very closely spaced sinusoids (center frequency 100 Hz). Adjust frequency separation and SNR. When $\Delta f$ is much smaller than the FFT bin width, the FFT shows only one peak, but MUSIC can resolve two.

$\Delta f$=3 Hz

SNR=20 dB

Adjust parameters above to see the plot

FFT bin width = 15.6 Hz. When $\Delta f < 15.6$ Hz, the FFT fundamentally cannot resolve them. Lower the SNR to observe the threshold effect.

References: [1] Schmidt, Multiple Emitter Location and Signal Parameter Estimation, IEEE Trans. AP, 1986 (Originalreport 1979). [2] Roy & Kailath, ESPRIT — Estimation of Signal Parameters via Rotational Invariance Techniques, IEEE Trans. ASSP, 1989. [3] Stoica & Moses, Spectral Analysis of Signals, Ch.4, Pearson, 2005. [4] Van Trees, Optimum Array Processing, Part IV of Detection, Estimation, and Modulation Theory, Wiley, 2002.

✅ Quick Check

Q1: What information does MUSIC need to work?

Show answer

It needs to know (or estimate) the number of signals p, in order to distinguish the signal subspace from the noise subspace. MDL or AIC criteria are typically used to estimate p.

Q2: What happens if two signals are fully correlated (coherent)?

Show answer

It fails — correlated signals cause the autocorrelation matrix to become rank-deficient, preventing correct subspace separation. Spatial smoothing preprocessing is required.

← 3.3 AR Model 3.5 Chirp-Z Transform →

3.5 Chirp-Z Transform (CZT)

A spectral magnifying glass — computing the DFT along any path in the Z-plane

Why does this matter? Because sometimes you only care about the details of a narrow frequency band (e.g., precisely measuring power grid frequency deviation). CZT lets you focus computational resources on that band, like a spectral magnifying glass.

Previously: Section 3.4's MUSIC can do super-resolution estimation, but is computationally expensive. If you just want to "zoom in" on a frequency band's details, there is a lighter tool —

Learning Objectives

Define the CZT: a generalized DFT sampled along a spiral in the Z-plane
Derive the Bluestein identity and $O(N\log N)$ computation method
Understand the CZT's "frequency zoom-in" capability and its difference from zero-padding
Distinguish "denser frequency sampling" from "higher frequency resolution"

One-Sentence Summary

CZT lets you compute only the frequency range you care about, like a magnifying glass on the spectrum.

Pain Point: I Only Care About a Small Frequency Range

You are doing power system frequency monitoring and only care about the tiny frequency deviation between 49.9~50.1 Hz. Sampling rate $f_s = 1\,\text{kHz}$, you collected 1 second of data ($N = 1000$). The FFT gives you the full spectrum from 0~500 Hz, each bin = 1 Hz, with only one or two points near 50 Hz — far too coarse to see a 0.01 Hz frequency deviation.

You could zero-pad to $N = 100000$ (100-second equivalent), making the bin width 0.01 Hz, but that means computing a 100K-point FFT, of which 99.96% of the computed results (0~49.9 Hz and 50.1~500 Hz) you don't need at all.

Is there a way to compute only the 49.9~50.1 Hz range, but with a dense 0.01 Hz spacing?

Origin

Lawrence Rabiner, Ronald Schafer, and Charles Rader (1969) at Bell Labs proposed the Chirp-Z Transform. Their key insight was: the DFT is actually equi-spaced sampling of the Z-transform on the unit circle; if you move the sampling points to any spiral in the Z-plane, you get a more flexible frequency analysis tool.

The paper title directly describes the method's core: "The Chirp z-Transform Algorithm." "Chirp" refers to the linear frequency-modulated signal used in the algorithm, because in the Bluestein identity that converts the DFT into a convolution, the kernel function is precisely a chirp.

Another important contribution by Rader was discovering that prime-length DFTs can be converted into convolutions (Rader's algorithm, 1968), while the CZT's Bluestein method is more general — it converts DFTs of any length into convolutions (no requirement for the length to be a power of 2 or prime).

Principle

Intuition: The DFT consists of $N$ equi-spaced samples of the Z-transform on the unit circle. CZT generalizes these samples to $M$ equi-spaced points on any spiral in the Z-plane. If the spiral covers only the small arc (frequency range) you are interested in, you get arbitrarily dense spectral samples in that range — and $M$ can be much larger or smaller than $N$.

CZT Definition

$$X(z_k) = \sum_{n=0}^{N-1}x[n]\,z_k^{-n}, \quad z_k = A\,W^{-k}, \; k = 0, 1, \ldots, M-1$$

$A = A_0\,e^{j\theta_0}$ (starting point), $W = W_0\,e^{j\phi_0}$ (step) define the spiral path in the Z-plane

DFT is a special case of CZT: $A = 1$, $W = e^{-j2\pi/N}$, $M = N$ ($N$ equi-spaced points on the unit circle).

Frequency zoom-in: Simply set $A = e^{j2\pi f_1/f_s}$ (start frequency), $W = e^{-j2\pi(f_2-f_1)/(Mf_s)}$ (frequency step), and the $M$-point CZT computes only the $M$-point spectrum within $[f_1, f_2]$.

Expand: Bluestein identity and convolution implementation

Direct computation of CZT is $O(NM)$, but the Bluestein identity converts it into a convolution:

Key: $kn = \frac{1}{2}[k^2 + n^2 - (k-n)^2]$. Therefore:

$$W^{kn} = W^{k^2/2}\,W^{n^2/2}\,W^{-(k-n)^2/2}$$

Substituting into the CZT definition:

$$X(z_k) = W^{k^2/2}\sum_{n=0}^{N-1}\underbrace{\left[x[n]\,A^{-n}\,W^{n^2/2}\right]}_{g[n]}\,\underbrace{W^{-(k-n)^2/2}}_{h[k-n]}$$

The summation inside the brackets is a linear convolution of $g[n]$ and $h[n] = W^{-n^2/2}$!

Therefore, the CZT can be computed with three FFTs:

Compute $g[n] = x[n]\,A^{-n}\,W^{n^2/2}$, $n = 0, \ldots, N-1$
Compute $h[n] = W^{-n^2/2}$, $n = -(N-1), \ldots, M-1$
FFT convolution: $y = g * h$ (zero-pad to $\geq N+M-1$, FFT → pointwise multiply → IFFT)
$X(z_k) = W^{k^2/2}\,y[k]$, $k = 0, \ldots, M-1$

Total complexity: $O((N+M)\log(N+M))$, much faster than the direct $O(NM)$. $\;\blacksquare$

How to Use

Just three parameters

Start frequency $f_1$: lower bound of your frequency band of interest
End frequency $f_2$: upper bound of your frequency band of interest
Number of sample points $M$: how many frequency points between $[f_1, f_2]$

Frequency spacing = $(f_2 - f_1)/M$. You can make this spacing arbitrarily small — but remember, this does not increase the true frequency resolution.

// CZT Zoom-in example (Python) from scipy.signal import czt, czt_points import numpy as np fs = 1000 # Sampling rate 1 kHz t = np.arange(0, 1, 1/fs) # 1 second of data x = np.sin(2*np.pi*50.03*t) # 50.03 Hz sinusoid # CZT zoom into 49.9 ~ 50.1 Hz f1, f2, M = 49.9, 50.1, 1000 z_points = czt_points(M, w=np.exp(-1j*2*np.pi*(f2-f1)/(M*fs)), a=np.exp(1j*2*np.pi*f1/fs)) X = czt(x, M, w=np.exp(-1j*2*np.pi*(f2-f1)/(M*fs)), a=np.exp(1j*2*np.pi*f1/fs)) # Result: 1000 points between 49.9~50.1 Hz # Frequency spacing = 0.0002 Hz, clearly showing the peak at 50.03 Hz

Application Scenarios

Power system frequency monitoring: The grid nominal frequency is 50/60 Hz, but the actual frequency fluctuates between 49.95~50.05 Hz. Monitoring this tiny deviation is critical for grid stability. CZT can achieve 0.001 Hz frequency measurement precision on 1 second of data (a direct FFT would need 1000 seconds to achieve the same bin density). IEC 61000-4-30 Class A power quality analyzers actually use CZT-like techniques.
Musical instrument tuner: A4 = 440 Hz, A4# = 466.16 Hz, semitone difference = 26.16 Hz ($\approx$ 6%). Tuning requires precision < 1 cent (0.058%, i.e., 0.26 Hz). For 44.1 kHz sampled audio of 0.1 seconds ($N = 4410$), FFT bin = 10 Hz, completely insufficient. CZT zoom to 430~450 Hz, $M = 2000$ points → spacing 0.01 Hz → easily achieves 0.1 cent precision.
Precision vibration analysis: Shaft vibration of rotating machinery (e.g., turbine generator, 3000 RPM = 50 Hz). Need to precisely track small changes in amplitude and phase of 1X (50 Hz) and 2X (100 Hz) with load/temperature. CZT focuses on the two narrow bands 49~51 Hz and 99~101 Hz, more flexible than order tracking.

Pitfalls and Limitations

CZT does not increase true frequency resolution: This is the most important and most easily misunderstood point. Frequency resolution is still limited by the observation time $T$: $\Delta f_{\text{resolution}} \approx 1/T$. CZT only provides denser frequency sampling (like reading with finer markings on a ruler), but cannot resolve two frequencies spaced closer than $1/T$. This is similar to zero-padding, but CZT can be applied to only the band you care about, making it more flexible and less computationally expensive.
Leave margin when choosing the $[f_1, f_2]$ range: If the zoom range is too narrow, you may truncate the mainlobe of the target frequency, causing sidelobe artifacts. Typically leave 2~3 mainlobe widths of margin at each end.
Windowing is still required: CZT does not eliminate spectral leakage. Data should still be windowed before CZT.
Numerical stability when $|W_0| \neq 1$ (spiral rather than arc): If the sampling path deviates too far from the unit circle, $z_k^{-n}$ grows or decays exponentially, causing numerical issues. Sampling on the unit circle ($|A_0| = |W_0| = 1$) is safest.

When Not to Use?

Need full-range spectrum: If you need to see the complete spectrum from 0 to $f_s/2$, a direct FFT is faster and simpler
Need to truly increase resolution (resolve close frequencies): CZT is a magnifying glass, not a microscope → use MUSIC / ESPRIT (Section 3.4) or collect longer data
$M$ and $N$ are both large and similar: CZT's three FFTs plus preprocessing is slower than a single direct FFT → only worthwhile when the zoom range is significantly smaller than the full range

References: [1] Rabiner, Schafer & Rader, The Chirp z-Transform Algorithm, IEEE Trans. Audio Electroacoustics, 1969. [2] Bluestein, A Linear Filtering Approach to the Computation of Discrete Fourier Transform, IEEE Trans. AU, 1970. [3] Oppenheim & Schafer, Discrete-Time Signal Processing, Section 9.6.

Interactive: Chirp-Z Frequency Magnifying Glass

Two sinusoids only 1 Hz apart (99.5 Hz + 100.5 Hz). The FFT's Δf≈3.9 Hz cannot resolve them; the CZT can zoom into any frequency band and clearly separate the two peaks.

Zoom center frequency: 100 Hz

Zoom bandwidth: 10 Hz

← 3.4 MUSIC / ESPRIT 4.1 Hilbert Transform →

4.1 Hilbert Transform

Mathematical foundation for constructing the analytic signal — extracting amplitude envelope and instantaneous phase from real signals

Why does this matter? Because envelope detection is the core operation in communication demodulation (AM) and mechanical fault diagnosis (bearings). The Hilbert transform is the cleanest, most mathematically grounded method for extracting the envelope.

Previously: Part III taught various spectral estimation methods. But some applications need more than just the spectrum — they also need the "envelope," i.e., how the signal amplitude varies over time. The Hilbert transform is the mathematical tool for extracting the envelope.

Learning Objectives

Define the Hilbert transform and its frequency-domain representation $-j\,\text{sgn}(\omega)$
Understand the physical meaning of the analytic signal: eliminating negative frequency redundancy
Master the three-step FFT-based Hilbert transform implementation
Recognize the limitations and correct usage of the Hilbert transform

One-Sentence Summary

The Hilbert transform converts a real signal into a complex signal (the analytic signal), allowing you to extract the amplitude envelope and instantaneous frequency.

Pain Point: How to Extract What Is "Hidden Inside the Carrier"?

Scenario 1: AM radio. A station modulates speech (20~4000 Hz) onto a 1 MHz carrier and broadcasts it. How does the radio "strip" the speech off the carrier? The speech is the carrier's amplitude envelope, but how do you extract the envelope from a real-valued signal?

Scenario 2: Bearing fault detection. A bearing outer race has a small defect; each time a rolling element passes over the defect it produces an impact, and these impacts excite the bearing housing's high-frequency resonance (2~5 kHz). The time-domain waveform shows a series of impact pulses "modulated" by the 2~5 kHz resonance. What you want to find is the repetition frequency of the impacts (BPFO, about 87 Hz), but it is hidden within the high-frequency resonance. You need to extract the envelope first, then FFT the envelope to find the BPFO.

In both scenarios, envelope detection is the core operation, and the Hilbert transform is the cleanest envelope detection tool.

Origin

David Hilbert (1905) introduced a special class of singular integral transforms while studying complex analysis and integral equations. This purely mathematical tool had no connection to engineering at the time.

Dennis Gabor (1946) was the key figure who brought the Hilbert transform into signal processing. In his landmark paper "Theory of Communication" (published in the Journal of the IEE), Gabor proposed the concept of the "analytic signal": pairing a real signal $x(t)$ with its Hilbert transform $\hat{x}(t)$ as the imaginary part to form a complex signal $z(t) = x(t) + j\hat{x}(t)$.

Gabor's motivation was communication theory — he wanted to give rigorous mathematical definitions for a signal's "instantaneous frequency" and "instantaneous amplitude." The analytic signal provided this framework. Incidentally, the same paper also introduced what would later be called the Gabor transform (a special case of the short-time Fourier transform).

Gabor later received the 1971 Nobel Prize in Physics for inventing holography.

Principle

Intuition: A real signal's spectrum is conjugate-symmetric ($X(-\omega) = X^*(\omega)$), meaning positive and negative frequencies carry exactly the same information. The analytic signal removes the negative frequencies and doubles the positive ones — no information is lost, but the representation is more concise. Moreover, after removing negative frequencies, the signal becomes complex, allowing direct extraction of the envelope and instantaneous phase via magnitude and phase angle.

Hilbert Transform (Time-Domain Definition)

$$\hat{x}(t) = \mathcal{H}\{x(t)\} = \frac{1}{\pi}\,\text{p.v.}\!\int_{-\infty}^{\infty}\frac{x(\tau)}{t - \tau}\,d\tau = x(t) * \frac{1}{\pi t}$$

p.v. = Cauchy principal value (avoiding the singularity at $\tau = t$)

Hilbert Transform (Frequency-Domain Representation)

$$\hat{X}(\omega) = -j\,\text{sgn}(\omega)\cdot X(\omega) = \begin{cases}-jX(\omega), & \omega > 0 \\ 0, & \omega = 0 \\ jX(\omega), & \omega < 0\end{cases}$$

Effect: Positive frequency components are phase-shifted by $-90°$, negative frequency components by $+90°$, with magnitude unchanged. It is an allpass phase shifter.

Expand: Why does $-j\,\text{sgn}(\omega)$ equal a 90-degree phase shift?

$-j = e^{-j\pi/2}$, so multiplying by $-j$ subtracts 90 degrees from the phase.

For positive frequency components $X(\omega)$ ($\omega > 0$): $\hat{X}(\omega) = -jX(\omega)$, phase shift $-90°$.

For negative frequency components ($\omega < 0$): $\text{sgn}(\omega) = -1$, so $\hat{X}(\omega) = jX(\omega)$, phase shift $+90°$.

Hilbert transform of $\cos(\omega_0 t)$:

$$\mathcal{H}\{\cos(\omega_0 t)\} = \sin(\omega_0 t)$$

Because each positive frequency component of $\cos$ is phase-shifted by $-90°$: $\cos(\omega_0 t - 90°) = \sin(\omega_0 t)$. $\;\blacksquare$

Analytic Signal

Definition

$$z(t) = x(t) + j\,\hat{x}(t)$$

Spectrum of the analytic signal:

$$Z(\omega) = \begin{cases}2X(\omega), & \omega > 0 \\ X(0), & \omega = 0 \\ 0, & \omega < 0\end{cases}$$

Only positive frequencies — negative frequencies are completely eliminated. This is the meaning of "analytic": in complex analysis, an analytic function's Fourier transform exists only in a half-plane.

Concrete example: $x(t) = A\cos(\omega_0 t + \phi)$
$\hat{x}(t) = A\sin(\omega_0 t + \phi)$
$z(t) = Ae^{j(\omega_0 t + \phi)}$
$|z(t)| = A$ (constant envelope), $\angle z(t) = \omega_0 t + \phi$ (linear phase).

FFT-based Hilbert Transform: Three Steps

This is the most common implementation in practice (and what MATLAB's hilbert() and SciPy's scipy.signal.hilbert() do internally):

// Hilbert Transform via FFT — returns analytic signal // Input: real signal x (length N) // Output: { re: real part, im: imaginary part } (analytic signal) function hilbert(x) { const N = np2(x.length); // Pad to power of 2 const re = zp(x, N); // Zero-pad const im = new Array(N).fill(0); const F = fft(re, im); // Step 1: FFT // Step 2: Zero out negative frequencies, double positive frequencies // F[0] (DC) and F[N/2] (Nyquist) remain unchanged for (let k = 1; k < N/2; k++) { F.re[k] *= 2; F.im[k] *= 2; } for (let k = N/2 + 1; k < N; k++) { F.re[k] = 0; F.im[k] = 0; } // Step 3: IFFT → obtain analytic signal // IFFT = conjugate → FFT → conjugate → /N const c_re = F.re.slice(); const c_im = F.im.map(v => -v); const R = fft(c_re, c_im); return { re: R.re.map(v => v / N), // Real part = original signal x(t) im: R.im.map(v => -v / N) // Imaginary part = Hilbert transform x̂(t) }; } // Envelope detection: |z(t)| = sqrt(re² + im²) function envelope(x) { const z = hilbert(x); return z.re.map((r, i) => Math.sqrt(r * r + z.im[i] * z.im[i]) ); }

How to Use

Envelope Detection Workflow

(Critical!) Bandpass filter: First restrict the signal to the narrow frequency band of interest. Computing the Hilbert envelope without filtering first typically yields physically meaningless results.
Compute Hilbert transform / analytic signal: Use the FFT three-step method above.
Take magnitude = envelope: $A(t) = |z(t)| = \sqrt{x^2(t) + \hat{x}^2(t)}$

Concrete example: AM demodulation

$x(t) = [1 + 0.8\cos(2\pi \cdot 5\,t)] \cdot \cos(2\pi \cdot 1000\,t)$

Carrier 1000 Hz, modulating wave 5 Hz (modulation depth 80%)

Directly compute the Hilbert envelope $|z(t)|$
The envelope perfectly recovers $1 + 0.8\cos(2\pi \cdot 5\,t)$ — the modulating waveform
This is the principle of AM envelope detection!

Python Example: Extracting the Envelope with scipy.signal.hilbert

# Python: use scipy.signal.hilbert to extract signal envelope import numpy as np import scipy.signal as sig import matplotlib.pyplot as plt # Generate AM signal: 1 kHz carrier, 50 Hz envelope fs = 10000 t = np.arange(0, 0.2, 1/fs) carrier = np.cos(2*np.pi*1000*t) envelope_true = 1 + 0.5*np.sin(2*np.pi*50*t) am_signal = envelope_true * carrier # Hilbert transform -> analytic signal z[n] = x[n] + j*x_hat[n] analytic = sig.hilbert(am_signal) # Extract envelope and instantaneous phase/frequency envelope = np.abs(analytic) inst_phase = np.unwrap(np.angle(analytic)) inst_freq = np.diff(inst_phase) / (2*np.pi) * fs # Plot fig, axes = plt.subplots(3, 1, figsize=(10, 9)) axes[0].plot(t, am_signal, 'b', linewidth=0.5, label='AM signal') axes[0].plot(t, envelope, 'r', linewidth=2, label='Hilbert envelope') axes[0].plot(t, envelope_true, 'g--', linewidth=1.5, label='True envelope') axes[0].legend(); axes[0].set_title('AM signal + envelope detection') axes[1].plot(t, envelope, 'r') axes[1].set_title('Extracted envelope (should show 50 Hz sinusoid)') axes[2].plot(t[1:], inst_freq, 'k') axes[2].set_title(f'Instantaneous frequency (should be near carrier 1000 Hz)') axes[2].set_xlabel('Time (s)'); axes[2].set_ylabel('Hz') plt.tight_layout(); plt.show()

Application Scenarios

Communication demodulation (AM / SSB): AM envelope detection as described above. SSB (single-sideband modulation) uses the Hilbert transform to eliminate half the bandwidth: $x_{\text{SSB}}(t) = x(t)\cos(\omega_c t) \mp \hat{x}(t)\sin(\omega_c t)$ (upper/lower sideband). SSB bandwidth is half that of AM, and is the standard modulation scheme for amateur radio and some military communications.
Bearing fault envelope spectrum analysis: Accelerometer measures bearing vibration → bandpass filter (focus on 2~5 kHz resonance band) → Hilbert envelope → envelope FFT → find BPFO = 87.3 Hz and its 2X (174.6 Hz), 3X (261.9 Hz) harmonics in the envelope spectrum. ISO 13373-3 and major vibration analysis software (e.g., SKF Microlog, B&K Pulse) all use this as their standard procedure.
Speech pitch tracking: Bandpass filter speech signal to the fundamental frequency range (80~400 Hz) → Hilbert envelope → autocorrelation of the envelope → first non-zero peak = fundamental period $T_0$ → fundamental frequency $F_0 = 1/T_0$.
Seismic wave analysis: The envelope (instantaneous amplitude) of seismic signals is used to determine P-wave and S-wave arrival times, as well as epicenter distance estimation.

Pitfalls and Limitations

Hilbert envelope of broadband signals has no physical meaning: If the signal is not narrowband (i.e., cannot be written as $x(t) \approx A(t)\cos[\omega_c t + \phi(t)]$), the envelope $|z(t)|$ will fluctuate wildly and be difficult to interpret. You must bandpass filter first to make the signal narrowband before computing the Hilbert envelope. This is the most important usage guideline.
Edge effect: The FFT performs circular convolution, producing artifacts at the signal's beginning and end. Solutions: (a) extend the data at both ends (mirroring or zero-padding), then trim the middle after computation; (b) use overlap-add segmented processing.
Discrete-time approximation: The continuous Hilbert transform is non-causal ($1/(\pi t)$ has values at $t < 0$). The FFT implementation is a finite-length approximation. When the signal bandwidth approaches the Nyquist frequency, the approximation quality degrades.
DC component issue: If the signal has a DC offset, the DC part remains in the real part but not the imaginary part after the Hilbert transform, causing a bias in the envelope estimate. Remove the mean before computing the Hilbert transform.

When Not to Use?

Signal has multiple components with different carriers: The Hilbert envelope will be a mixed envelope of all components, unable to separate them → bandpass filter to separate components first, or use EMD/HHT (Section 5.6)
Need time-frequency analysis (not just the envelope): Hilbert only provides a one-dimensional envelope and instantaneous frequency, not a full time-frequency distribution → use STFT (Section 5.1) or CWT (Section 5.4)
Envelope detection in broadband noise: Filter first, then Hilbert. Without filtering, the envelope detector will track the random envelope of the noise, yielding no useful information

Interactive: Envelope Detection

AM modulated signal $x(t) = [1 + m\cos(2\pi f_m t)]\cos(2\pi \cdot 1000\,t)$. The Hilbert envelope (orange) perfectly tracks the modulating waveform. Adjust modulation frequency $f_m$ to observe the effect.

Modulation frequency $f_m$=5 Hz

Adjust parameters above to see the plot

References: [1] Gabor, Theory of Communication, J. IEE, 1946. [2] Hahn, Hilbert Transforms in Signal Processing, Artech House, 1996. [3] Marple, Computing the Discrete-Time Analytic Signal via FFT, IEEE Trans. SP, 1999. [4] Feldman, Hilbert Transform Applications in Mechanical Vibration, Wiley, 2011.

✅ Quick Check

Q1: What does the Hilbert transform do in the frequency domain?

Show answer

Positive frequencies are phase-shifted by -90 degrees, negative frequencies by +90 degrees, with magnitude unchanged. Equivalent to multiplying by -j·sgn(ω).

Q2: Why must you bandpass filter before computing the Hilbert envelope?

Show answer

Because the envelope of a broadband signal has no physical meaning. Bandpass filtering makes the signal narrowband, so the envelope can correctly reflect modulation characteristics.

← 3.5 Chirp-Z Transform 4.2 Envelope & Instantaneous Frequency →

4.2 Envelope & Instantaneous Frequency

Extracting time-varying amplitude and frequency from the analytic signal — a core tool for mechanical fault diagnosis

Why does this matter? Because for rotating machinery problems like bearing faults and gear defects, the fault signatures are not directly visible in the spectrum — they are hidden in the "envelope" of high-frequency resonances. Envelope spectrum analysis is the standard tool for industrial predictive maintenance.

Previously: Section 4.1 introduced the mathematics of the Hilbert transform. Now let us look at its most important application: extracting the envelope and instantaneous frequency from the analytic signal, which is the key technique for bearing fault diagnosis.

Learning Objectives

Derive envelope and instantaneous frequency from the polar representation of the analytic signal
Understand the limitation that instantaneous frequency has physical meaning only for narrowband signals
Master the complete 6-step bearing fault envelope spectrum analysis workflow
Compare Hilbert envelope with traditional rectification + low-pass filtering methods

One-Sentence Summary

The magnitude of the analytic signal is the envelope, and the derivative of its phase is the instantaneous frequency — letting you track how a signal's amplitude and frequency change over time.

Pain Point

"I want to know the frequency of this chirp signal at every instant." A linear chirp sweeps from 100 Hz to 1000 Hz, but the standard FFT only tells you "the signal contains components from 100~1000 Hz" — it does not tell you which time instant corresponds to which frequency.

"What is the repetition frequency of the bearing impacts?" The vibration signal measured by the accelerometer looks like a blob of high-frequency noise — the impact pattern is invisible to the eye. But if you can extract the envelope, the impact pattern emerges, and then the envelope spectrum reveals the characteristic frequency.

Origin

The concept of instantaneous frequency emerged naturally within Gabor's (1946) framework: the time derivative of the analytic signal $z(t) = A(t)e^{j\phi(t)}$'s phase, $\frac{1}{2\pi}\frac{d\phi}{dt}$, is the instantaneous frequency. However, whether "instantaneous frequency" has physical meaning sparked a long-standing academic debate.

Ville (1948) rigorously proved that: for narrowband signals, Gabor's instantaneous frequency equals the conditional expectation (first moment of frequency) of the Wigner-Ville distribution, thus having clear physical meaning. But for broadband signals, the instantaneous frequency can be negative or even infinite — losing the intuitive meaning of frequency.

Envelope spectrum analysis as a mechanical fault diagnosis tool was systematically developed by Robert B. Randall and others in the 1980s~1990s, becoming the international standard method for rotating machinery condition monitoring (ISO 13373, ISO 10816 series).

Principle

Write the analytic signal in polar form:

$z(t) = A(t)\,e^{j\phi(t)}$

$$A(t) = |z(t)| = \sqrt{x^2(t) + \hat{x}^2(t)} \quad \text{(instantaneous amplitude / envelope)}$$ $$\phi(t) = \arg[z(t)] = \arctan\frac{\hat{x}(t)}{x(t)} \quad \text{(instantaneous phase)}$$ $$f_i(t) = \frac{1}{2\pi}\frac{d\phi}{dt} \quad \text{(instantaneous frequency)}$$

Intuition: Imagine the signal as a rotating vector (phasor). At each instant, the vector's length = envelope $A(t)$, the vector's angle = phase $\phi(t)$, the vector's rotation speed = instantaneous angular frequency $2\pi f_i(t)$.

Expand: Computing discrete-time instantaneous frequency

In discrete time, the phase is $\phi[n] = \arctan(\hat{x}[n]/x[n])$. The instantaneous frequency is approximated by differencing:

$$f_i[n] = \frac{f_s}{2\pi}\,\Delta\phi[n] = \frac{f_s}{2\pi}\left(\phi[n] - \phi[n-1]\right)$$

Note: The output of $\arctan$ is in the range $(-\pi, \pi]$, and the phase difference between adjacent samples may "jump" by $\pm 2\pi$ (phase wrapping). You must perform phase unwrapping before differencing:

$$\Delta\phi_{\text{unwrapped}}[n] = \text{wrap}(\phi[n] - \phi[n-1]) = \Delta\phi[n] - 2\pi\,\text{round}\!\left(\frac{\Delta\phi[n]}{2\pi}\right)$$

Or a more robust method — compute directly from the analytic signal's real and imaginary parts:

$$f_i[n] = \frac{f_s}{2\pi}\,\text{Im}\!\left(\frac{z[n]\,z^*[n-1]}{|z[n]\,z^*[n-1]|}\right) \cdot \frac{1}{\Delta t}$$

$\blacksquare$

How to Use: Complete Bearing Fault Envelope Spectrum Analysis Workflow

This is one of the most important spectral analysis techniques in rotating machinery condition monitoring. Below is the complete 6-step industrial workflow with specific values:

Step 1: Acquire accelerometer data

Sampling rate $f_s = 25.6\,\text{kHz}$ (common vibration analysis sampling rate, covering up to 10 kHz). Acquire 2 seconds of data → $N = 51200$ points.

Step 2: Bandpass filter (focus on resonance band)

This is the most critical step. Bearing impacts excite structural resonances, with energy concentrated in a high-frequency band (typically 2~10 kHz, depending on the bearing and structure).

Design a bandpass filter: center frequency 3.5 kHz, bandwidth 2~5 kHz. Use a 4th-order Butterworth bandpass filter. Spectral kurtosis can be used to automatically select the optimal filter band (Antoni 2006).

Step 3: Hilbert envelope detection

$A[n] = |z[n]| = \sqrt{x_{\text{filtered}}^2[n] + \hat{x}_{\text{filtered}}^2[n]}$

Result: a series of pulses, each corresponding to a rolling element passing over the defect. Pulse spacing = $1/\text{BPFO}$.

Step 4: Low-pass filter the envelope (remove carrier residuals)

The envelope may still contain some high-frequency residuals (from the carrier). Use a low-pass filter with cutoff frequency $\approx 500\,\text{Hz}$ (well above the expected fault frequency, but well below the carrier frequency).

Step 5: Envelope FFT (envelope spectrum)

FFT the low-pass filtered envelope (using a Hann window). Frequency resolution $\Delta f = f_s/N = 25600/51200 = 0.5\,\text{Hz}$ (sufficient to resolve harmonics of the fault frequency).

Step 6: Search for characteristic frequencies in the envelope spectrum

Bearing fault characteristic frequencies (SKF 6205 bearing at 1797 RPM as example):

Fault Type	Characteristic Frequency	Value (Hz)	Pattern in Envelope Spectrum
Outer race (BPFO)	$n_b f_r(1-d/D)/2$	87.3	87.3, 174.6, 261.9 Hz
Inner race (BPFI)	$n_b f_r(1+d/D)/2$	162.2	162.2 Hz $\pm f_r$ sidebands
Rolling element (BSF)	$Df_r(1-(d/D)^2)/(2d)$	69.6	$2\times$ BSF + sidebands

$n_b$=9 (number of rolling elements), $f_r$=29.95 Hz (rotational frequency), $d/D$=0.3348 (rolling element diameter / pitch circle diameter)

Interpretation guidelines: Seeing BPFO and its 2X, 3X harmonics in the envelope spectrum → outer race fault. Seeing BPFI with $\pm f_r$ sidebands → inner race fault (the inner race rotates with the shaft; the fault entering and leaving the load zone produces modulation). The number and relative height of harmonics indicate fault severity.

Application Scenarios

Rotating machinery condition monitoring: Wind turbine gearbox and generator bearings (bearing replacement cost per turbine: $150,000~$300,000 + downtime losses). Hourly automatic envelope spectrum analysis, trending BPFO/BPFI energy. Early warning can reduce downtime from weeks to planned days. Globally adopted by wind farms (e.g., Bruel & Kjaer Vibro, SKF Enlight).
Speech pitch tracking: Bandpass speech signal to 50~500 Hz → Hilbert envelope → autocorrelation or envelope spectrum → fundamental frequency $F_0$. Instantaneous frequency is more direct: bandpass near the fundamental → instantaneous frequency $f_i(t)$ gives real-time intonation changes. Used for prosody analysis in speech synthesis and singing pitch detection.
Ultrasonic non-destructive testing (NDT): Ultrasonic pulses (center frequency 5 MHz) reflect inside materials. Envelope detection extracts the amplitude variation of echoes with depth (A-scan display). Envelope peak location = defect depth, peak magnitude ∝ defect reflection coefficient. Standard inspection method in aerospace, nuclear, and petrochemical industries.

Pitfalls and Limitations

Instantaneous frequency has physical meaning only for narrowband signals: This limitation is worth repeating. If the signal's bandwidth is comparable to its center frequency (i.e., $BW \approx f_c$), the instantaneous frequency will exhibit rapid fluctuations or even negative values, completely losing any "frequency" meaning. Rule of thumb: instantaneous frequency is reliable only when $BW / f_c < 0.3$.
Computing the envelope without bandpass filtering first → meaningless results: If the signal contains components in multiple frequency bands (e.g., shaft frequency, mesh frequency, and resonance frequency simultaneously in bearing vibration), the Hilbert envelope will be a mixture of all components, unable to isolate the desired fault signature.
Phase unwrapping difficulties: In low-SNR regions or when the envelope is near zero, the phase estimate jumps wildly, causing spike-like false values in instantaneous frequency. Countermeasures: (a) low-pass filter the instantaneous frequency; (b) only trust instantaneous frequency when the envelope value is sufficiently large.
Hilbert envelope vs. rectification + low-pass: Traditional "rectification + low-pass filtering" can also do envelope detection, but the Hilbert method's advantage is that no low-pass cutoff frequency needs to be chosen (the envelope is generated automatically), and it provides more accurate envelope estimates for narrowband signals. The drawback is that the FFT method requires the entire data segment (non-causal), making it unsuitable for real-time processing.

When Not to Use?

Need a full time-frequency distribution (not just an envelope curve): Use STFT spectrogram (Section 5.1) or CWT scalogram (Section 5.4)
Signal is highly non-stationary with multiple time-varying components: Use EMD / HHT (Section 5.6), which adaptively decomposes into multiple IMFs, each then analyzed with Hilbert envelope and instantaneous frequency
Real-time (causal) envelope detection: The Hilbert FFT method is non-causal (requires the entire data segment). Real-time systems can use FIR Hilbert filters (with delay) or simple rectification + low-pass filtering

References: [1] Gabor, Theory of Communication, J. IEE, 1946. [2] Randall & Antoni, Rolling Element Bearing Diagnostics — A Tutorial, Mech. Sys. Sig. Proc., 2011. [3] Antoni, The Spectral Kurtosis: A Useful Tool for Characterising Non-Stationary Signals, Mech. Sys. Sig. Proc., 2006. [4] Boashash, Estimating and Interpreting the Instantaneous Frequency of a Signal, Proc. IEEE, 1992.

📝 Worked Example

CNC spindle bearing SKF 6205 (n=9, d=7.94mm, D=38.5mm), speed 3600 RPM. (a) Compute BPFO. (b) How to choose the bandpass filter band? (c) What frequency resolution is needed for the envelope spectrum?

Show solution

(a) f_r=60Hz, BPFO = (9/2)×60×(1−7.94/38.5) = 214.2 Hz

(b) Choose the high-frequency resonance band, typically 2-8 kHz (determined by the measured frequency response function)

(c) To see BPFO≈214 Hz and its 2x=428 Hz, Δf must be at least <5 Hz → observation time ≥0.2 seconds

Interactive: Bearing Fault Envelope Spectrum Analysis

Periodic impacts from bearing faults excite structural resonances, but clear fault frequencies are not visible in the raw FFT. Through envelope analysis (bandpass → Hilbert envelope → envelope spectrum), the hidden BPFO modulation frequency can be extracted.

RPM: 1800 Resonance freq (Hz): 3200 Noise level: 40

← 4.1 Hilbert Transform 4.3 Cepstrum Analysis →

4.3 Cepstrum Analysis

The spectrum of the spectrum — discovering hidden periodic structures in the "quefrency" domain

Why does this matter? Because when the spectrum contains periodic patterns (harmonic families, sideband families, echoes), the cepstrum can reveal that periodicity at a glance — it is a classic tool for speech pitch detection and gearbox analysis.

Previously: Section 4.2 used envelope analysis to find periodic impacts in the time domain. But sometimes the periodicity is in the frequency domain rather than the time domain — the spectrum has a set of equi-spaced peaks (harmonic families, sideband families). The cepstrum is the tool for analyzing "periodicity in the spectrum."

Learning Objectives

Understand the cepstrum definition: real/power cepstrum vs. complex cepstrum
Establish the relationship between quefrency-axis peaks and periodic structures in the spectrum
Master cepstrum applications in speech pitch detection and gearbox fault analysis
Recognize numerical issues arising from logarithmic operations and phase unwrapping

One-Sentence Summary

Cepstrum = the spectrum of the spectrum. When the spectrum contains periodic patterns (such as equi-spaced sidebands or harmonic families), the cepstrum helps you find that period.

Pain Point: Periodic Patterns in the Spectrum

Scenario 1: Gearbox fault. The vibration spectrum from gear meshing is centered on the Gear Mesh Frequency (GMF), with an entire family of equi-spaced sidebands on both sides, spaced at the shaft rotation frequency. For example, GMF = 600 Hz, rotation speed = 30 Hz — you see peaks at 510, 540, 570, 600, 630, 660, 690 Hz. The human eye can see they are equi-spaced (30 Hz apart), but automatically detecting "what is the spacing of these peaks" algorithmically is not easy.

Scenario 2: Speech pitch detection. The speech signal's spectrum has a set of harmonics: $F_0, 2F_0, 3F_0, \ldots$, where $F_0$ is the fundamental frequency (male $\approx$ 100 Hz, female $\approx$ 200 Hz). These harmonics form an equi-spaced pattern in the spectrum, with spacing = $F_0$. You need a method to automatically find this equi-spacing.

The cepstrum is designed precisely for this: it applies another Fourier transform to the spectrum, converting "periodic patterns in the spectrum" into "peaks in the cepstrum."

Origin

B.P. Bogert, M.J.R. Healy, and the renowned statistician John W. Tukey (1963) proposed the cepstrum during a seismic wave analysis study at Bell Labs. Their original motivation was to detect echoes in seismic signals: seismic waves reflected by geological layers produce delayed copies. In the spectrum, this manifests as periodic ripples, with the ripple frequency = the reciprocal of the echo delay.

Tukey, in his characteristic humorous style, reversed the letters of all related terms:

Original	Reversed	Meaning
spectrum	cepstrum	Spectrum of the spectrum
frequency	quefrency	Cepstrum horizontal axis (unit: time)
harmonics	rahmonics	"Harmonics" in the cepstrum
filtering	liftering	Filtering operation in the cepstrum domain

The paper title itself is full of Tukey's style: "The Quefrency Alanysis of Time Series for Echoes: Cepstrum, Pseudo-Autocovariance, Cross-Cepstrum, and Saphe Cracking". Yes, even "alanysis" is an intentional letter-reversal of "analysis."

The cepstrum later found its broadest applications in speech processing and mechanical fault diagnosis. In speech processing, it is the foundation of MFCC (Mel-Frequency Cepstral Coefficients) — one of the most important features in speech recognition.

Principle

Intuition: If the spectrum has equi-spaced peaks (spacing $\Delta f$), it is as if the spectrum has a "frequency" $= \Delta f$. Applying another FFT to the spectrum reveals this "frequency" — it appears as a peak at quefrency = $1/\Delta f$.

Real/Power Cepstrum

$$c[n] = \text{IFFT}\left\{\log\left|X[k]\right|\right\} = \text{IFFT}\left\{\log\left|\text{FFT}\{x[n]\}\right|\right\}$$

Complex Cepstrum

$$\hat{c}[n] = \text{IFFT}\left\{\log X[k]\right\} = \text{IFFT}\left\{\log|X[k]| + j\,\angle X[k]\right\}$$

Meaning of the quefrency axis:

The unit of quefrency is time (seconds or sample counts)
A peak at quefrency $= \tau$ → the spectrum has a periodic structure with spacing $= 1/\tau$ Hz
Peak height ∝ strength of the periodic structure

Expand: Why take the log?

The cepstrum's design has a deep reason: homomorphic deconvolution.

Many signals can be modeled as the convolution of two components:

$$x[n] = s[n] * h[n] \quad \Longleftrightarrow \quad X[k] = S[k] \cdot H[k]$$

Take the logarithm:

$$\log|X[k]| = \log|S[k]| + \log|H[k]|$$

Convolution becomes multiplication in the frequency domain; after taking the log, it becomes addition. Then apply IFFT:

$$c_x[n] = c_s[n] + c_h[n]$$

If $c_s$ and $c_h$ occupy different regions in the quefrency domain, liftering (windowing/filtering in the cepstral domain) can separate them.

Speech example: Speech $x = e * v$ (glottal excitation $e$ convolved with vocal tract impulse response $v$).

Vocal tract $v$'s cepstrum is concentrated at low quefrency ($< 3$ ms) → spectral envelope (formants)
Excitation $e$'s cepstrum has a peak at high quefrency ($= T_0 \approx 5$~$10$ ms) → fundamental frequency

Apply a low-time lifter to retain low quefrency → keeps only the vocal tract response → formants can be extracted.

Find peaks at high quefrency → fundamental frequency estimation. $\;\blacksquare$

How to Use: Three Steps

Step 1: FFT → magnitude → log → IFFT → real cepstrum

$c[n] = \text{IFFT}\{\log|X[k]|\}$. Note that $|X[k]|$ may have zero values → add a small constant $\varepsilon$ to avoid $\log(0)$: $\log(|X[k]| + \varepsilon)$.

Step 2: Search for peaks on the quefrency axis

Ignore DC/low-frequency components near quefrency $\approx 0$. Search for peaks within a reasonable quefrency range (based on expected periodic structures).

Step 3: Peak location $\tau$ → corresponds to periodic structure spacing $1/\tau$ in the spectrum

Confirm results by checking for "rahmonics" of the peak ($2\tau, 3\tau, \ldots$).

// Real cepstrum computation (JavaScript) function realCepstrum(x) { const N = np2(x.length); const re = zp(x, N); const im = new Array(N).fill(0); const F = fft(re, im); // log|X[k]| const eps = 1e-12; const logMag = F.re.map((r, k) => { const mag = Math.sqrt(r * r + F.im[k] * F.im[k]); return Math.log(mag + eps); }); // IFFT of logMag (real input → use FFT trick) const imZ = new Array(N).fill(0); const C = fft(logMag, imZ); // FFT of log|X| // IFFT = conjugate, FFT, conjugate, /N return C.re.map(v => v / N); // real cepstrum }

Concrete example: Gearbox fault detection

Gearbox: number of teeth $z = 20$, rotational speed $N_r = 1800\,\text{RPM} = 30\,\text{Hz}$
Gear mesh frequency GMF = $z \times f_r = 20 \times 30 = 600\,\text{Hz}$
If the gear has a localized defect → the spectrum shows equi-spaced sidebands around 600 Hz: $\ldots, 540, 570, 600, 630, 660, \ldots$ Hz
Sideband spacing = $30\,\text{Hz}$ (= rotational frequency)
The cepstrum shows a clear peak at quefrency $= 1/30 = 33.3\,\text{ms}$
Plus smaller peaks at $66.7\,\text{ms}$ (2X) and $100\,\text{ms}$ (3X) (rahmonics)
Conclusion: the spectrum has a periodic structure with 30 Hz spacing → points to gear fault with rotational speed as the modulation frequency

Application Scenarios

Speech pitch detection: The speech signal's spectrum has a series of harmonics $F_0, 2F_0, \ldots, nF_0$. The cepstrum shows a peak at quefrency $= 1/F_0$. Male $F_0 \approx 100\,\text{Hz}$ → peak at 10 ms; female $F_0 \approx 200\,\text{Hz}$ → peak at 5 ms. This is one of the classic pitch detection methods (alongside autocorrelation), widely used in speech coding (G.729) and music analysis. MFCC features are a direct descendant of cepstrum analysis.
Gearbox fault sideband analysis: As in the example above. The cepstrum can automatically extract sideband spacing from complex spectra without manually identifying spectral peak patterns. Industrial software such as B&K PULSE and SKF @ptitude have built-in cepstrum analysis. ISO 13373-9 specifically standardizes cepstrum application in gearbox diagnostics.
Echo detection and removal: If the signal $y[n] = x[n] + \alpha x[n-D]$ (original signal plus an echo delayed by $D$), the spectrum $|Y| = |X|\cdot|1 + \alpha e^{-j\omega D}|$. $\log|Y| = \log|X| + \log|1 + \alpha e^{-j\omega D}|$. The second term is periodic with period $2\pi/D$ → the cepstrum shows a peak at quefrency $= D$, precisely locating the echo delay. Used in audio post-production and telephone echo cancellation.
Mechanical fault transmission path isolation: Vibration signal $X = S \cdot H$ (excitation source $S$ × transfer path $H$). In the cepstrum, $c_x = c_s + c_h$. The periodic features of the excitation source (e.g., bearing fault frequency) appear at specific quefrencies, while the transfer path's influence (structural resonance envelope) is concentrated at low quefrency. Liftering can separate them, making fault signatures clearer.

Pitfalls and Limitations

$\log(0)$ problem: If $|X[k]| = 0$ (spectrum has zero values), $\log(0) = -\infty$. In practice, a small constant must be added: $\log(|X[k]| + \varepsilon)$, where $\varepsilon \approx 10^{-10}$ ~ $10^{-12}$. Too large an $\varepsilon$ will distort the cepstrum shape.
Phase unwrapping for the complex cepstrum is very tricky: The complex cepstrum requires $\log X = \log|X| + j\angle X$, where $\angle X$ must be continuous (unwrapped phase). But discrete spectrum phase is only defined in $(-\pi, \pi]$, and unwrapping algorithms frequently fail in the presence of noise. In practice, usually only the real cepstrum (power cepstrum) is used, avoiding the phase problem.
Frequency resolution vs. quefrency resolution trade-off: The cepstrum's quefrency resolution = $1/f_s$ (one sample interval). To resolve two peaks with very close quefrencies (i.e., two close spectral periodicities), very high frequency resolution (long FFT) is needed, which in turn requires long data.
The real cepstrum is an even function: $c[n] = c[-n]$ (because after taking log the result is real, so the IFFT output is symmetric). Only the $n > 0$ portion needs to be examined.
Not suitable for non-periodic spectral features: The cepstrum's strength is detecting "equi-spaced patterns in the spectrum." If the fault spectrum is not equi-spaced sidebands but rather broadband elevation or a single peak shift, the cepstrum is less useful than directly examining the spectrum.

When Not to Use?

No periodic patterns in the spectrum: The cepstrum's advantage lies in detecting "spectral periodicity." If your analysis target is a single frequency peak or broadband noise characteristics, examining the spectrum or PSD directly is more effective
Need precise power/energy measurements: The logarithmic operation destroys the linear power relationship. If you need precise spectral energy values → use Welch PSD (Section 3.2)
Analyzing bearing faults (not gearbox): Bearing faults are usually more directly and effectively analyzed with the envelope spectrum (Section 4.2). The cepstrum is better suited for gearboxes (since gear sidebands are typical equi-spaced patterns)
Real-time speech pitch tracking: The cepstrum method requires the FFT of an entire speech segment, with significant delay. Real-time systems more commonly use autocorrelation or the YIN algorithm

References: [1] Bogert, Healy & Tukey, The Quefrency Alanysis of Time Series for Echoes: Cepstrum, Pseudo-Autocovariance, Cross-Cepstrum, and Saphe Cracking, Proc. Symposium on Time Series Analysis, 1963. [2] Oppenheim & Schafer, Discrete-Time Signal Processing, Ch.13 (Homomorphic Signal Processing). [3] Randall, Vibration-based Condition Monitoring: Industrial, Automotive and Aerospace Applications, Wiley, 2011. [4] Noll, Cepstrum Pitch Determination, JASA, 1967.

Interactive: Cepstrum Echo Detection

Original signal plus an echo delayed by D. The cepstrum shows a peak at quefrency=D.

Echo delay D=50 samples

Echo strength=0.50

Adjust parameters above to see the plot

← 4.2 Envelope & Inst. Frequency 5.1 STFT →

5.1 Short-Time Fourier Transform (STFT)

The cornerstone of time-frequency analysis — sliding-window FFT

Why does this matter? Because the frequency content of most real-world signals changes over time. FFT only tells you "which frequencies are present" but not "when they appear." STFT is the most fundamental time-frequency analysis tool and the baseline for all advanced methods.

Previously: The analysis methods in Part IV assumed that the signal's frequency characteristics do not change over time. But real signals are usually non-stationary. STFT lets you see "when" and "what frequency" simultaneously.

Learning Objectives

Understand the core idea of STFT: windowed segmentation + FFT, bringing time information into frequency analysis
Master the selection logic for three key parameters: window length, overlap ratio, and NFFT
Interpret typical patterns in Spectrograms
Understand the fundamental limitation of Heisenberg's uncertainty principle on STFT resolution

One-Sentence Summary

STFT is "sliding-window FFT" — it lets you see how frequency content changes over time. Cut a long signal into many short segments, perform FFT on each segment, then arrange the results along the time axis to obtain a Spectrogram.

Pain Point: FFT Loses "Time"

Standard FFT tells you "which frequency components are in the signal," but tells you nothing about when those components appear. For time-varying signals, this is a fatal flaw:

Speech: When transitioning from vowel /a/ to /i/, the formant frequencies change dramatically within 50 ms. FFT simply mixes all frequencies together
Engine vibration: Accelerating from idle at 800 rpm to 6000 rpm, the dominant vibration frequency rises continuously. FFT only tells you "frequencies from 800 to 6000 rpm are all present"
Seismic waves: P-waves (compressional, high-frequency, arrive first) and S-waves (shear, low-frequency, arrive later) require simultaneous analysis of arrival time and frequency characteristics
Music: The onset/offset times, pitch changes, and vibrato of each note in a melody all require simultaneous time + frequency analysis

Fundamental problem: The Fourier transform basis function $e^{j\omega t}$ extends to $\pm\infty$ in time, so it inherently cannot provide time localization. We need a method to "localize" the signal to a finite time segment before analysis.

Origin

Dennis Gabor (1946) first proposed using a Gaussian-windowed STFT (which he called "logons") to analyze communication signals in his classic paper Theory of Communication. Gabor's core insight was that information in a signal lies not only in frequency but also in time — we need a joint time-frequency representation.

J. B. Allen (1977) systematized the theory of discrete STFT in his IEEE paper Short Term Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform, establishing a complete framework for window function selection, Overlap-Add reconstruction, and more, laying the foundation for modern digital speech processing.

Principle

Intuition: Imagine you are listening to a recording of a concert. Instead of listening to the entire song at once and then analyzing frequencies (that would be FFT), you take a "snapshot" at regular short intervals — recording which frequencies are present and how loud they are at each moment. Arrange all snapshots in a row, and you get a Spectrogram.

Steps:

Choose a window function $w[n]$ (e.g., Hann window), length $L$ samples
Slide the window to position $m$ in the signal, extract the local segment $x[n] \cdot w[n-m]$
Perform FFT on this local segment → obtain the local spectrum at position $m$
Slide the window by $H$ samples (Hop Size), repeat steps 2-3
Arrange all local spectra along the time axis → Spectrogram

Discrete STFT Definition

$$\text{STFT}\{x\}[m, k] = \sum_{n=0}^{L-1} x[n + mH]\, w[n]\, e^{-j2\pi kn/N_{\text{FFT}}}$$

$m$: time frame index, $k$: frequency bin index, $H$: Hop Size, $L$: window length, $N_{\text{FFT}}$: FFT length

Spectrogram

$$S[m,k] = |\text{STFT}\{x\}[m,k]|^2$$

Squared magnitude → power spectral density as a function of time

Expand: Continuous STFT Definition and Properties Derivation

Continuous STFT:

$$\text{STFT}\{x\}(t,\omega) = \int_{-\infty}^{\infty} x(\tau)\, w(\tau - t)\, e^{-j\omega\tau}\, d\tau$$

This can be understood as the inner product of $x(\tau)$ and $w(\tau - t)e^{j\omega\tau}$ — i.e., the "component magnitude" of the signal near time $t$ and frequency $\omega$.

Inverse transform (reconstruction):

$$x(t) = \frac{1}{2\pi \|w\|^2} \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} \text{STFT}\{x\}(\tau,\omega)\, w(t-\tau)\, e^{j\omega t}\, d\omega\, d\tau$$

The prerequisite is that the window function $w(t) \neq 0$ (non-zero window), ensuring perfect reconstruction.

Energy conservation (Parseval-like):

$$\int_{-\infty}^{\infty} |x(t)|^2\, dt = \frac{1}{2\pi \|w\|^2} \int\!\!\int |\text{STFT}\{x\}(t,\omega)|^2\, dt\, d\omega$$

Heisenberg Uncertainty Principle

The time resolution $\Delta t$ and frequency resolution $\Delta f$ of STFT cannot both be arbitrarily good simultaneously:

$$\Delta t \cdot \Delta f \geq \frac{1}{4\pi}$$

Longer window → smaller $\Delta f$ (clearer frequency view) → but larger $\Delta t$ (blurrier time view). And vice versa. This is not a technical limitation, but a fundamental mathematical limit. STFT uses the same window across the entire time-frequency plane → resolution is the same everywhere → this is a fixed rectangular tiling.

How to Use: Practical Parameter Selection Guide

Step 1: Determine the Frequency Resolution $\Delta f$ You Need

Frequency resolution is determined by window length: $\Delta f \approx f_s / L$ ($L$ = window length in samples).

Formula: To achieve $\Delta f = 4$ Hz with sampling rate $f_s = 1000$ Hz → window length $L = f_s / \Delta f = 1000/4 = 250$ samples.

Step 2: Choose a Window Function

Window Function	Main Lobe Width	Sidelobe Attenuation	Use Cases
Rectangular	Narrowest ($2f_s/L$)	-13 dB (worst)	Transient analysis, known signals without leakage
Hann	$4f_s/L$	-31 dB	General-purpose first choice
Hamming	$4f_s/L$	-43 dB	Speech analysis
Blackman-Harris	$8f_s/L$	-92 dB	High dynamic range requirements
Gaussian ($\alpha$=2.5)	Adjustable	No sidelobes (theoretically)	Gabor analysis, minimum time-frequency area

Step 3: Determine Overlap Ratio

Overlap ratio = $(L - H)/L \times 100\%$, where $H$ = Hop Size.

50% (Hann window): Minimum overlap satisfying the COLA (Constant Overlap-Add) condition, ensuring perfect reconstruction
75%: Smoother time axis, better time resolution — recommended for most cases
87.5%: For very fine time tracking (e.g., pitch tracking)

Time resolution: $\Delta t = H / f_s$ (time interval between frames).

Step 4: Choose NFFT

NFFT $\geq$ window length $L$, choose the next power of 2 (most efficient for FFT). If NFFT $>$ L → zero-padding → denser frequency axis (interpolation effect, but does not increase true frequency resolution).

Scenario 1: Speech Analysis

Parameter	Value	Rationale
$f_s$	16 kHz	Standard sampling rate for telephony/speech recognition
Window length $L$	512 samples = 32 ms	Covers 2-3 fundamental frequency periods (male $F_0 \approx 100$ Hz → 10 ms/period)
Overlap	75% → $H$ = 128 samples	Smooth tracking of formant changes
NFFT	512	Already a power of 2
$\Delta f$	$16000/512 = 31.25$ Hz	Sufficient to distinguish adjacent formants
$\Delta t$	$128/16000 = 8$ ms	Sufficient to track rapid speech changes

Scenario 2: Mechanical Vibration Analysis

Parameter	Value	Rationale
$f_s$	25.6 kHz	Common for vibration analysis (frequency range up to 10 kHz)
Window length $L$	4096 samples = 160 ms	Fine frequency resolution needed to distinguish closely spaced gear mesh frequencies
Overlap	75% → $H$ = 1024 samples	Track speed changes
NFFT	4096	Already a power of 2
$\Delta f$	$25600/4096 = 6.25$ Hz	Can distinguish harmonics 10 Hz apart
$\Delta t$	$1024/25600 = 40$ ms	Sufficient to track moderate speed changes

Python Example: Computing a Spectrogram with scipy.signal.stft

# Python: use scipy.signal.stft to compute a spectrogram import numpy as np import scipy.signal as sig import matplotlib.pyplot as plt # Generate chirp signal: frequency sweeps linearly from 100 Hz to 1500 Hz fs = 4000 t = np.arange(0, 2, 1/fs) x = sig.chirp(t, f0=100, f1=1500, t1=2, method='linear') # Compute STFT # nperseg = window length, noverlap = number of overlapping samples f, t_stft, Zxx = sig.stft(x, fs=fs, nperseg=256, noverlap=192) # Compute frequency and time resolution df = fs / 256 dt = (256 - 192) / fs # hop size in seconds print(f"Frequency resolution df = {df:.1f} Hz") print(f"Time resolution dt = {dt*1000:.1f} ms") # Plot spectrogram fig, axes = plt.subplots(2, 1, figsize=(10, 8)) axes[0].plot(t, x, linewidth=0.5) axes[0].set_title('Time domain chirp signal') axes[0].set_xlabel('Time (s)') axes[1].pcolormesh(t_stft, f, 20*np.log10(np.abs(Zxx) + 1e-12), cmap='viridis', shading='auto') axes[1].set_ylim(0, 2000) axes[1].set_ylabel('Frequency (Hz)'); axes[1].set_xlabel('Time (s)') axes[1].set_title('Spectrogram (dB)') plt.colorbar(axes[1].collections[0], ax=axes[1]) plt.tight_layout(); plt.show() # Inverse transform to verify reconstruction _, x_recon = sig.istft(Zxx, fs=fs, nperseg=256, noverlap=192) print(f"Reconstruction error: {np.max(np.abs(x - x_recon[:len(x)])):.2e}")

Interactive: Spectrogram and Window Length Trade-off

Choose different signals and window lengths to observe how the Spectrogram changes. Longer windows yield clearer frequency axes but blurrier time axes, and vice versa.

Signal:

Window length = 256 samples

Adjust parameters above to see the plot

Application Scenarios

Speech recognition front-end (Mel-Spectrogram): The input to modern ASR systems (e.g., Whisper) is STFT → Mel filter bank → log. Typical settings: 25 ms window, 10 ms hop, 80 Mel bins. This produces 100 frames $\times$ 80-dimensional feature matrix per second
Music Information Retrieval (MIR): Analyzing chord progressions, melody lines, and rhythmic structure in songs. Uses longer windows (2048-4096 samples @ 44.1 kHz = 46-93 ms) for sufficient frequency resolution to distinguish semitones
Vibration order tracking: During engine acceleration, the diagonal lines on the STFT spectrogram represent the frequency trajectories of various orders. Engineers use this to find resonance points (sudden amplitude increases at certain RPMs)
Seismic wave analysis: Spectrogram can distinguish P-waves (arrive first, high-frequency) from S-waves (arrive later, low-frequency), aiding in seismic wave characterization
EEG event-related spectral analysis: Analyzing brainwave frequency changes after specific stimuli (e.g., alpha wave suppression, gamma wave enhancement), window length 0.5-2 seconds

Pitfalls and Limitations

Window too long → time blurring: If the signal's frequency changes dramatically within 10 ms but you use a 100 ms window, the change gets "averaged out" and appears as a blurry blob on the Spectrogram
Window too short → frequency blurring: A 32-sample window @ 1 kHz → $\Delta f = 31$ Hz, completely unable to distinguish two tones at 440 Hz and 460 Hz
No "perfect" window length: The Heisenberg uncertainty principle guarantees that you cannot simultaneously achieve arbitrarily good time and frequency resolution. This is a physical law, not a technical shortcoming
Spectrogram discards phase: $|STFT|^2$ throws away phase information. When phase is needed (e.g., signal reconstruction, Griffin-Lim algorithm), the full STFT must be retained
Output size: A 10-second piece of music at $f_s = 44.1$ kHz, window length 2048, hop 512, NFFT 2048 → time frames $\approx 860$, frequency bins = 1025 → approximately 880,000 complex values. Memory requirements for long signals can be substantial

When Not to Use STFT? Alternatives

Scenario	Problem	Alternative
Need multi-resolution (frequency detail at low freq, time detail at high freq)	STFT's fixed window cannot adapt	CWT (Wavelet Transform) → Section 5.4
Analyzing very short transients (< a few cycles)	Frequency is meaningless when window is too short	WVD → Section 5.2 or Matching Pursuit
Nonlinear, non-stationary complex signals	Sinusoidal basis is not appropriate	EMD / HHT → Section 5.6
Only need to track a few specific frequencies	Computing the full spectrum with STFT is wasteful	Goertzel algorithm or Chirp-Z Transform

References: [1] Gabor, D., Theory of Communication, J. IEE, 93(26):429-457, 1946. [2] Allen, J.B., Short Term Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform, IEEE Trans. ASSP, 25(3):235-238, 1977. [3] Oppenheim & Schafer, Discrete-Time Signal Processing, 3rd ed., Ch.10. [4] Griffin & Lim, Signal Estimation from Modified Short-Time Fourier Transform, IEEE Trans. ASSP, 1984.

✅ Quick Check

Q1: For speech analysis using a 32 ms window (fs=16 kHz → 512 samples), what is the frequency resolution?

Show answer

Δf = fs/N = 16000/512 = 31.25 Hz. Sufficient to distinguish speech formants (spaced ~500-1000 Hz apart).

Q2: What happens if the STFT window is too short? Too long?

Show answer

Too short → frequency blurring (cannot resolve frequencies); too long → time blurring (cannot see rapid changes). There is no perfect window length.

← 4.3 Cepstral Analysis 5.2 Wigner-Ville Distribution →

5.2 Wigner-Ville Distribution (WVD)

Theoretically the highest-resolution time-frequency distribution — but at the cost of "ghost" artifacts

Why does this matter? Because the time-frequency resolution of STFT is limited by the uncertainty principle. WVD breaks through this limitation — at the cost of cross-terms. Understanding WVD is the foundation for understanding all quadratic time-frequency distributions.

Previously: The STFT in Section 5.1 is limited by the uncertainty principle: fixed window → fixed time-frequency resolution. WVD attempts to break through this limitation — at the cost of cross-terms.

Learning Objectives

Understand the definition of WVD and its relationship to the local autocorrelation function
Prove that WVD satisfies perfect Marginal Properties, free from Heisenberg limitations
Understand the cause, location, and amplitude of Cross-Terms
Master the smoothing strategies of Pseudo-WVD and Smoothed-WVD

One-Sentence Summary

WVD is theoretically the highest-resolution time-frequency distribution — but it produces "ghost artifacts" (cross-terms) where you don't expect them. Like a funhouse mirror: the main subject is seen very clearly, but strange phantoms appear nearby.

Pain Point: Can We Break the Heisenberg Limit?

STFT's time-frequency resolution is limited by the Heisenberg inequality $\Delta t \cdot \Delta f \geq 1/(4\pi)$ — the window length is fixed, and time and frequency resolution trade off against each other.

Is there a way to break through this limitation and simultaneously achieve perfect time resolution and frequency resolution? WVD's answer is: yes, but at a cost.

Origin

Eugene Wigner (1932) proposed this distribution in quantum mechanics to describe the "quasi-probability distribution" of quantum states in phase space (position-momentum). Wigner noticed it can take negative values — impossible in classical probability, reflecting the non-classical nature of quantum mechanics.

Jean Ville (1948) independently introduced the same mathematical form into signal processing for analyzing instantaneous frequency and power of signals. Hence it is called the Wigner-Ville Distribution.

Interestingly, Wigner later received the 1963 Nobel Prize in Physics — but not for WVD, rather for his contributions to the theory of atomic nuclei and elementary particles.

Principle

Intuition: At each time instant $t$, compute the signal's "local autocorrelation" (centered at $t$, looking at the correlation over $\tau/2$ before and after), then take the Fourier transform with respect to the lag $\tau$ → obtaining the "local spectrum" at that time instant.

Wigner-Ville Distribution

$$W_x(t,\omega) = \int_{-\infty}^{\infty} x\!\left(t + \frac{\tau}{2}\right)\, x^*\!\left(t - \frac{\tau}{2}\right)\, e^{-j\omega\tau}\, d\tau$$

Interpretation: $x(t+\tau/2) \cdot x^*(t-\tau/2)$ is the "instantaneous autocorrelation function" centered at $t$. Taking the FT with respect to $\tau$ → just like the Wiener-Khinchin theorem, the FT of the autocorrelation = power spectrum. Except here it is local and time-varying.

Perfect Marginal Properties

WVD satisfies the following remarkable properties:

$$\int_{-\infty}^{\infty} W_x(t,\omega)\, \frac{d\omega}{2\pi} = |x(t)|^2 \quad \text{(time marginal = instantaneous power)}$$ $$\int_{-\infty}^{\infty} W_x(t,\omega)\, dt = |X(\omega)|^2 \quad \text{(frequency marginal = power spectral density)}$$ $$\int\!\!\int W_x(t,\omega)\, dt\, \frac{d\omega}{2\pi} = \|x\|^2 = E_x \quad \text{(total energy)}$$

This means WVD is an "ideal" energy distribution — the time and frequency projections exactly equal the instantaneous power and power spectrum, respectively. No Heisenberg inequality limitation.

Expand: Time Marginal Property Proof

Compute $\int W_x(t,\omega)\, d\omega/(2\pi)$:

$$\int \frac{d\omega}{2\pi} \int x\!\left(t+\frac{\tau}{2}\right) x^*\!\left(t-\frac{\tau}{2}\right) e^{-j\omega\tau}\, d\tau$$

Swap the order of integration:

$$= \int x\!\left(t+\frac{\tau}{2}\right) x^*\!\left(t-\frac{\tau}{2}\right) \underbrace{\left[\int \frac{e^{-j\omega\tau}}{2\pi}\, d\omega\right]}_{\delta(\tau)}\, d\tau$$

Using the sifting property of $\delta(\tau)$, set $\tau = 0$:

$$= x(t)\, x^*(t) = |x(t)|^2 \quad \blacksquare$$

Expand: Exact WVD Result for a Single-Component Chirp

Consider a linear chirp $x(t) = e^{j(\omega_0 t + \beta t^2/2)}$ (instantaneous frequency $\omega_i(t) = \omega_0 + \beta t$).

Substituting into the WVD:

$$x\!\left(t+\frac{\tau}{2}\right) x^*\!\left(t-\frac{\tau}{2}\right) = e^{j(\omega_0 + \beta t)\tau}$$

Therefore:

$$W_x(t,\omega) = \int e^{j(\omega_0+\beta t)\tau}\, e^{-j\omega\tau}\, d\tau = 2\pi\,\delta(\omega - \omega_0 - \beta t)$$

The WVD is precisely concentrated on the line of instantaneous frequency $\omega_i(t) = \omega_0 + \beta t$ in the time-frequency plane — perfect time-frequency localization, with no blurring whatsoever. $\blacksquare$

The Fatal Problem: Cross-Terms

For a multi-component signal $x = x_1 + x_2$:

$$W_{x_1+x_2}(t,\omega) = \underbrace{W_{x_1}(t,\omega)}_{\text{auto-term}} + \underbrace{W_{x_2}(t,\omega)}_{\text{auto-term}} + \underbrace{2\,\text{Re}\left[W_{x_1,x_2}(t,\omega)\right]}_{\text{cross-term}}$$

where the cross-term $W_{x_1,x_2}(t,\omega) = \int x_1(t+\tau/2)\, x_2^*(t-\tau/2)\, e^{-j\omega\tau}\, d\tau$.

Three alarming properties of cross-terms:

Location: They appear at the midpoint of the time-frequency positions of $x_1$ and $x_2$ (average in both time and frequency)
Amplitude: They can be as large as the auto-terms, or even larger
Oscillation: Cross-terms oscillate rapidly (in the direction of the time-frequency difference between the two components), with frequency proportional to their time-frequency separation

For a signal with $N$ components, there are $N$ auto-terms but $N(N-1)/2$ cross-terms — when $N$ is large, cross-terms completely overwhelm the auto-terms!

How to Use

Obtain the Analytic Signal: First apply the Hilbert transform to $x(t)$ to get the analytic signal $z(t) = x(t) + j\hat{x}(t)$. This eliminates cross-terms between positive and negative frequencies (which appear near zero frequency and are annoying)
Compute the discrete WVD:
% MATLAB / Octave conceptual code N = length(z); WV = zeros(N, N); for n = 1:N for tau = -(N-1):(N-1) n1 = n + round(tau/2); n2 = n - round(tau/2); if n1 >= 1 && n1 <= N && n2 >= 1 && n2 <= N WV(n, :) = WV(n, :) + z(n1)*conj(z(n2)) * exp(-1j*2*pi*(0:N-1)*tau/N); end end end

Note: Computational complexity is $O(N^2)$, much larger than STFT's $O(N \log N)$.
Apply kernel smoothing to suppress cross-terms:
- Pseudo-WVD (PWVD): Window only in the $\tau$ direction → suppresses cross-term components far from $\tau=0$
- Smoothed Pseudo-WVD (SPWVD): Window in both $t$ and $\tau$ directions → stronger cross-term suppression, but greater resolution loss
Visualization: Plot $W_x(t,\omega)$. Note that WVD can take negative values — this is not an error, but its non-classical property (similar to quasi-probability distributions in quantum mechanics)

Application Scenarios

Precise time-frequency analysis of single-component chirp signals: For linear frequency-modulated (LFM) chirp pulses in radar returns, WVD achieves the theoretical limit of precision. For example: a radar chirp with 10 MHz bandwidth and 100 $\mu$s pulse width — WVD perfectly recovers its time-frequency slope $\beta = 10^{11}$ Hz/s
Radar Ambiguity Function: The ambiguity function $A(\tau, \nu)$ is exactly the 2D Fourier transform of the WVD. Therefore, WVD properties directly correspond to the range-velocity resolution capability of the radar waveform
Quantum optics / quantum information: The Wigner function remains the primary tool for describing quantum states of light fields (e.g., squeezed states, Fock states, cat states) to this day

Pitfalls and Limitations

Nearly unusable directly for multi-component signals: 3 components yield 3 cross-terms, 10 components yield 45 cross-terms. After smoothing, the resolution advantage is greatly diminished
Computational complexity $O(N^2)$: Much larger than STFT's $O(N \log N)$. At N = 10000, WVD requires approximately 750 times the computation time of STFT
Can take negative values: Cannot be directly interpreted as an "energy distribution" — although the marginals are correct, local negative values make physical interpretation difficult
Discretization difficulties: 2x oversampling is required to avoid aliasing in the discrete WVD

When Not to Use WVD? Alternatives

Scenario	Problem	Alternative
Multi-component signals	Too many cross-terms	Smoothed Pseudo-WVD or Choi-Williams → 5.3 Cohen's Class
Need fast computation	$O(N^2)$ too slow	STFT ($O(N\log N)$)
Nonlinear non-stationary signals	Quadratic distribution assumptions not flexible enough	EMD / HHT → Section 5.6
Need cross-term-free high resolution	WVD cross-terms cannot be eliminated	SST → Section 5.7

References: [1] Wigner, E., On the Quantum Correction for Thermodynamic Equilibrium, Phys. Rev., 40:749-759, 1932. [2] Ville, J., Theorie et Applications de la Notion de Signal Analytique, Cables et Transmission, 2(1):61-74, 1948. [3] Claasen, T.A.C.M. & Mecklenbräuker, W.F.G., The Wigner Distribution, Philips J. Res., 1980. [4] Cohen, L., Time-Frequency Analysis, Prentice Hall, 1995.

Interactive: WVD vs STFT Side-by-Side Comparison

Compare the STFT spectrogram with Pseudo-WVD. WVD has better time-frequency resolution, but dual-component signals exhibit cross-terms.

Signal selection:

← 5.1 STFT 5.3 Cohen's Class →

5.3 Cohen's Class: Generalized Quadratic Time-Frequency Distributions

Unified framework — all quadratic time-frequency distributions are smoothed versions of WVD

Why does this matter? Because STFT and WVD are just two extremes of time-frequency analysis. Cohen's Class provides a unified mathematical framework, letting you understand that all quadratic time-frequency distributions are different choices of "how to smooth the WVD."

Previously: The WVD in Section 5.2 has cross-term problems. The STFT in Section 5.1 has no cross-terms but poor resolution. Cohen's Class provides a unified framework for finding the optimal trade-off between the two.

Learning Objectives

Understand the unified formula of Cohen's Class: the kernel function $\Phi(\theta,\tau)$ completely determines the distribution's properties
Recognize the trade-off between auto-term preservation and cross-term suppression via the kernel function
Compare specific distributions such as WVD, Spectrogram, and Choi-Williams
Choose appropriate kernel functions based on signal characteristics

One-Sentence Summary

Cohen's Class is a unified framework: the STFT Spectrogram, WVD, and all time-frequency distributions in between — the only difference is "which kernel function smoothes the WVD." Choosing a kernel is like turning a radio dial: one end gives the highest resolution but with cross-terms (WVD), the other end gives no cross-terms but is blurry (Spectrogram).

Pain Point: Can We Compromise Between STFT and WVD?

We now face two extremes:

STFT Spectrogram: No cross-terms ✓ but resolution limited by Heisenberg ✗
WVD: Unlimited resolution ✓ but severe cross-terms for multi-component signals ✗

Is there a distribution between the two — retaining most of the resolution advantage while effectively suppressing cross-terms? Cohen's Class provides a systematic answer.

Origin

Leon Cohen (1966, 1989) proposed this unified theory. His core insight was that all "reasonable" quadratic time-frequency distributions (i.e., those satisfying time-shift and frequency-shift covariance) can be described by the same mathematical framework, differing only in the choice of a 2D kernel function $\Phi(\theta, \tau)$.

His 1989 review paper Time-Frequency Distributions — A Review published in Proceedings of the IEEE became one of the most cited papers in the time-frequency analysis field (over 5000 citations).

Principle

Intuition: Imagine the WVD as a high-resolution photo with lots of noise (cross-terms). Cohen's Class applies different blur filters to this photo — the stronger the filter, the less noise, but also the more blurred the details. Different kernel functions = different blur filters.

Cohen's Class Unified Formula

$$C_x(t,\omega) = \frac{1}{4\pi^2}\int\!\!\int\!\!\int e^{-j\theta t - j\tau\omega + j\theta u}\, \Phi(\theta,\tau)\, x\!\left(u+\frac{\tau}{2}\right) x^*\!\left(u-\frac{\tau}{2}\right)\, du\, d\tau\, d\theta$$

$\Phi(\theta,\tau)$: kernel function, completely determines the distribution's properties

Equivalently, it can be written as the 2D convolution of the WVD with the kernel function:

$$C_x(t,\omega) = \iint \phi(t-t', \omega-\omega')\, W_x(t',\omega')\, dt'\, d\omega'$$

where $\phi(t,\omega)$ is the 2D Fourier transform of $\Phi(\theta,\tau)$

Implication: All Cohen's Class distributions are some form of 2D smoothing of the WVD. The kernel function determines in which directions and how much smoothing is applied in the time-frequency plane.

Expand: Relationship Between Kernel Properties and Marginal Conditions

Theorem: The necessary and sufficient condition for a Cohen's Class distribution $C_x$ to satisfy the marginal properties (i.e., $\int C_x\, d\omega = |x(t)|^2$ and $\int C_x\, dt = |X(\omega)|^2$) is:

$$\Phi(\theta, 0) = 1 \;\;\forall\theta \quad \text{and} \quad \Phi(0, \tau) = 1 \;\;\forall\tau$$

Proof (time marginal):

$$\int C_x(t,\omega)\, \frac{d\omega}{2\pi} = \frac{1}{2\pi}\int\!\!\int\!\!\int\!\!\int e^{-j\theta t - j\tau\omega + j\theta u}\, \Phi\, K_x(u,\tau)\, du\, d\tau\, d\theta\, d\omega$$

First integrate over $\omega$: $\int e^{-j\tau\omega}\, d\omega/(2\pi) = \delta(\tau)$, setting $\tau = 0$:

$$= \frac{1}{2\pi}\int\!\!\int e^{-j\theta t + j\theta u}\, \Phi(\theta, 0)\, |x(u)|^2\, du\, d\theta$$

If $\Phi(\theta, 0) = 1$: $= \int |x(u)|^2\, \delta(t-u)\, du = |x(t)|^2 \quad \blacksquare$

Note: Most practical kernel functions do not simultaneously satisfy both marginal conditions. Sacrificing marginal properties is the cost of cross-term suppression.

Comparison of Common Kernel Functions

Kernel $\Phi(\theta,\tau)$	Distribution Name	Auto-Term Preservation	Cross-Term Suppression	Marginal Properties
$\Phi = 1$	WVD	Perfect	None	Perfect
$\Phi = e^{-\theta^2\tau^2/\sigma}$	Choi-Williams (CWD)	Good	Suppresses cross-terms far from origin	Satisfied
$\Phi = e^{-\|\theta\tau\|^\alpha}$	Zhao-Atlas-Marks (ZAM)	Good	Suppresses off-axis cross-terms	Satisfied
$\Phi = h(\tau)$ (depends only on $\tau$)	Pseudo-WVD	Moderate	Smoothing in $\tau$ direction	Frequency marginal ✓ Time marginal ✗
$\|H(\omega)\|^2$ (Spectrogram)	Spectrogram	Most blurred	Fully suppressed	Not satisfied

The Spectrogram is also a member of Cohen's Class! It can be shown that $|\text{STFT}|^2$ is equivalent to smoothing the signal's WVD with the WVD of the window function as the kernel:

$S_x(t,\omega) = \iint W_h(t'-t, \omega'-\omega)\, W_x(t',\omega')\, dt'\, d\omega'$

where $W_h$ is the WVD of the window function $h$. This is why the Spectrogram has no cross-terms — because it uses the "heaviest" smoothing.

How to Use

Analyze your signal characteristics: Number of components? Time-frequency separation? SNR?
Choose a kernel function:
- Single component → WVD ($\Phi = 1$), no cross-term issues
- Few components, well separated → Choi-Williams ($\sigma = 1$~$10$), usually the best starting point for trade-offs
- Many components or closely spaced → Spectrogram (fully suppress cross-terms, accept resolution loss)
Tune parameters: Larger $\sigma$ in Choi-Williams → less smoothing → closer to WVD; smaller $\sigma$ → more smoothing → closer to Spectrogram
Visualize and verify: Check for residual cross-terms (oscillations appearing where no components should exist)

Practical advice: First use a Spectrogram to quickly observe the signal's overall time-frequency structure and confirm the number and locations of components. Then use Choi-Williams to improve resolution, starting from $\sigma = 1$ and gradually increasing until cross-terms begin to appear.

Application Scenarios

Engine acceleration analysis: The Choi-Williams distribution can clearly show how gear mesh frequencies rise linearly with RPM while suppressing cross-terms between different orders. Typical parameter $\sigma = 5$, analyzing 0-6000 RPM acceleration
Speech transition analysis: Consonant-to-vowel transitions (e.g., /ba/ → /a/) involve rapid formant migration. CWD can more precisely localize the time and frequency trajectories of transitions than the Spectrogram
Underwater acoustics: Multipath propagation in underwater channels causes multiple time-frequency components to overlap. Cohen's Class distributions help separate arrival times and Doppler shifts of different paths

Pitfalls and Limitations

No "best" kernel function: Different signals require different kernels. Automatic kernel selection remains an open research problem
Marginal properties are usually sacrificed: Most practical kernel functions do not satisfy the perfect marginal conditions, resulting in errors in time or frequency projections
Computational cost: General Cohen's Class computation is $O(N^2)$ or more (triple integral), much slower than STFT
Can take negative values: Like WVD, Cohen's Class distributions can generally take negative values (unless the kernel is designed to be positive definite, such as the Spectrogram)

When Not to Use? Alternatives

Scenario	Problem	Alternative
Signal is multiple pure sinusoids + noise	Quadratic distributions are not the best tool	MUSIC / ESPRIT (parametric methods are more direct)
Need real-time processing	$O(N^2)$ too slow	STFT ($O(N\log N)$), trade resolution for speed
Highly nonlinear signals	Quadratic distribution assumptions (stationarity approximation) do not hold	EMD / HHT → Section 5.6
Need multi-scale analysis	Cohen's Class has fixed resolution	CWT → Section 5.4

References: [1] Cohen, L., Generalized Phase-Space Distribution Functions, J. Math. Phys., 7(5):781-786, 1966. [2] Cohen, L., Time-Frequency Distributions — A Review, Proc. IEEE, 77(7):941-981, 1989. [3] Choi, H. & Williams, W., Improved Time-Frequency Representation of Multicomponent Signals, IEEE Trans. ASSP, 37(6):862-871, 1989. [4] Hlawatsch, F. & Boudreaux-Bartels, G.F., Linear and Quadratic Time-Frequency Signal Representations, IEEE SP Magazine, 1992.

← 5.2 Wigner-Ville 5.4 Continuous Wavelet Transform (CWT) →

5.4 Continuous Wavelet Transform (CWT)

Multi-resolution time-frequency analysis — frequency detail at low frequencies, time detail at high frequencies, automatically adaptive

Why does this matter? Because STFT has a fixed window length, but you often need to simultaneously resolve low frequencies and pinpoint high-frequency transients. The multi-resolution property of wavelets automatically resolves this contradiction — this is why wavelets are extensively used in seismology, finance, and neuroscience.

Previously: Cohen's Class in Section 5.3 still uses a fixed analysis window. CWT uses scalable wavelets — automatically using long windows for low frequencies and short windows for high frequencies — breaking through the fixed-window limitation.

🌉 From STFT to CWT: Why do we need wavelets?

STFT analyzes signals with a fixed-size window, which leads to a fundamental problem:

Low-frequency signals change slowly and need long windows to resolve frequency (but you lose time localization)
High-frequency signals change quickly and need short windows to localize events (but you lose frequency precision)
STFT can only pick one fixed window length → both ends are unsatisfied

Concrete example: When analyzing music, bass notes (50–200 Hz, lasting 0.5 s) need a ~200 ms window, but a cymbal hit (5 kHz, lasting 5 ms) needs a ~5 ms window. STFT cannot do both at once.

CWT's solution: use a "stretchable window" — automatically shorter when analyzing high frequencies (good time precision), automatically longer when analyzing low frequencies (good frequency precision). This is multi-resolution analysis.

Below we'll see how CWT uses a single mother wavelet $\psi(t)$ together with two parameters — scale (window width) and translation (window position) — to achieve this adaptive analysis.

Learning Objectives

Understand how CWT achieves multi-resolution analysis through scale dilation
Master the physical meaning of the Admissibility Condition
Compare the characteristics and use cases of Morlet and Mexican Hat wavelets
Be able to convert between scale $a$ and frequency $f$
Understand the boundary effects of the Cone of Influence

One-Sentence Summary

CWT is an evolution of STFT — long windows for low frequencies to resolve frequency detail, short windows for high frequencies to resolve time detail, automatically adaptive. Like a zoom lens on a telescope: automatically zooming in for distant objects (low frequencies) and zooming out for nearby objects (high frequencies).

Pain Point: The Fixed-Window Dilemma of STFT

STFT's window length is fixed. But many real signals require different resolutions at different frequencies:

Seismic waves: Low-frequency components ($< 1$ Hz surface waves) require very long windows ($> 5$ seconds) to resolve frequencies, while high-frequency components ($> 10$ Hz body waves) need short windows ($< 100$ ms) to precisely locate arrival times. STFT cannot achieve both
Music: The low notes C2 (65 Hz) and C#2 (69 Hz) differ by only 4 Hz, requiring long windows to distinguish; but percussive sounds in the high range need millisecond-level time localization
Biomedical signals: The ECG QRS complex lasts 60-100 ms (high frequency), while the T wave lasts 200-400 ms (low frequency) — analysis requirements are completely different

Fundamental limitation of STFT: Fixed window = fixed rectangular time-frequency tiles. All frequencies use tiles of the same size, unable to simultaneously meet the different needs of low and high frequencies. CWT's solution: let the tile dimensions automatically adjust with frequency.

Origin

Jean Morlet (1982), a French geophysicist, first proposed the concept of wavelet analysis while studying seismic waves. He found that Fourier analysis worked poorly for seismic signals (which are transient and non-stationary), and conceived the idea of using "small waves" (wavelets) of different widths to match components of different frequencies.

Alex Grossmann & Jean Morlet (1984) jointly published the rigorous mathematical framework in SIAM, defining the continuous wavelet transform and the admissibility condition. Grossmann was a theoretical physicist who provided a solid mathematical foundation for Morlet's intuition.

Ingrid Daubechies (1988) further established a rigorous framework for wavelet theory, constructing orthogonal wavelet bases with compact support, opening the era of discrete wavelets (→ Section 5.5).

Principle

Intuition: Choose a "mother wavelet" $\psi(t)$ (a short oscillating waveform). To analyze low frequencies → stretch the mother wavelet (increase scale $a$) → better match for low frequencies, window length automatically increases. To analyze high frequencies → compress the mother wavelet (decrease scale $a$) → better match for high frequencies, window length automatically decreases.

Continuous Wavelet Transform (CWT)

$$W_x(a, b) = \frac{1}{\sqrt{|a|}}\int_{-\infty}^{\infty} x(t)\, \psi^*\!\left(\frac{t-b}{a}\right)\, dt$$

$a$: Scale, controls the wavelet width ($a$ large → stretched → low frequency)

$b$: Translation, controls the wavelet's time position

$\psi$: Mother Wavelet

$1/\sqrt{|a|}$: Energy normalization factor

Admissibility Condition

The mother wavelet must satisfy:

$$C_\psi = \int_0^{\infty} \frac{|\hat{\Psi}(\omega)|^2}{\omega}\, d\omega < \infty$$

This condition requires $\hat{\Psi}(0) = 0$, meaning the mother wavelet must have zero mean: $\int \psi(t)\, dt = 0$.

Physical intuition: the wavelet must be oscillatory (alternating positive and negative), with no DC component. This guarantees the invertibility of the CWT.

Expand: CWT Inverse Transform Formula Derivation

CWT inverse transform (reconstruction formula):

$$x(t) = \frac{1}{C_\psi}\int_0^{\infty}\int_{-\infty}^{\infty} W_x(a,b)\, \frac{1}{\sqrt{a}}\,\psi\!\left(\frac{t-b}{a}\right)\, db\, \frac{da}{a^2}$$

The key step in the proof uses Parseval's theorem and the admissibility condition, working in the frequency domain:

$$\hat{W}_x(a,\omega) = \sqrt{a}\, \hat{X}(\omega)\, \hat{\Psi}^*(a\omega)$$

Substituting into the inverse transform, when integrating over $a$ we use:

$$\int_0^{\infty} |\hat{\Psi}(a\omega)|^2\, \frac{da}{a} = C_\psi \quad \text{(independent of $\omega$)}$$

Therefore $x(t)$ is perfectly reconstructed. $\blacksquare$

Multi-Resolution Property: CWT vs STFT Time-Frequency Tiles

	STFT	CWT
Tile shape	Fixed-size rectangles	Varies with scale: wide-tall at low freq, narrow-short at high freq
Low-freq behavior	$\Delta f$ fixed (may not be small enough)	$\Delta f$ small (long window → high frequency resolution)
High-freq behavior	$\Delta t$ fixed (may not be small enough)	$\Delta t$ small (short window → high time resolution)
Area $\Delta t \cdot \Delta f$	Fixed (determined by window function)	Fixed (determined by mother wavelet)
Basis functions	Translation + modulation (windowed $e^{j\omega t}$)	Translation + dilation (scaled $\psi$)

Key understanding: CWT does not "violate" the Heisenberg uncertainty principle — each tile's area $\Delta t \cdot \Delta f$ is still bounded. The difference is that the tile shape automatically adjusts with frequency, giving each frequency the most suitable time/frequency resolution ratio.

Common Mother Wavelets

Morlet Wavelet (Most Common for Time-Frequency Analysis)

$$\psi(t) = \pi^{-1/4}\, e^{j\omega_0 t}\, e^{-t^2/2}$$

$\omega_0 \approx 5$~$6$ (center frequency, typically $\omega_0 = 6$)

Complex exponential with a Gaussian envelope → approximately Gaussian in both time and frequency domains → minimum time-frequency area (approaching the Heisenberg limit). The first choice for time-frequency analysis.

Mexican Hat (DOG-2, Common for Singularity Detection)

$$\psi(t) = \frac{2}{\sqrt{3}\,\pi^{1/4}}\,(1-t^2)\,e^{-t^2/2}$$

Negative second derivative of a Gaussian. Real-valued wavelet → no complex phase information. Shaped like a Mexican hat. Suitable for detecting signal peaks and singularities.

Scale-Frequency Correspondence

$$f \approx \frac{f_\psi}{a \cdot \Delta t}$$

$f_\psi$: center frequency of the mother wavelet (Morlet with $\omega_0 = 6$: $f_\psi \approx 0.955$ Hz); $\Delta t$: sampling interval

Note: The scale-frequency correspondence is not exact and depends on the spectral shape of the mother wavelet. Different mother wavelets have different $f_\psi$ values. Do not equate scale with frequency.

How to Use: Step-by-Step

Step 1: Choose a Mother Wavelet
- Time-frequency analysis → Morlet (complex-valued, has phase, best time-frequency resolution)
- Singularity / edge detection → Mexican Hat (real-valued, sensitive to peaks)
- Need consistency with DWT → Daubechies (compact support, for validating DWT results)
Step 2: Determine Scale Range $[a_{\min}, a_{\max}]$
- Corresponding frequency range $[f_{\min}, f_{\max}]$: $a_{\min} = f_\psi / (f_{\max} \cdot \Delta t)$, $a_{\max} = f_\psi / (f_{\min} \cdot \Delta t)$
- Scales are typically sampled geometrically (logarithmic spacing): $a_k = a_{\min} \cdot 2^{k \cdot dj}$, $dj$ commonly $1/12$~$1/4$
Step 3: Compute CWT Coefficients
% MATLAB example scales = 2.^(0:0.1:7); % scale range coefs = cwtft(x, 'wavelet','morl','scales',scales); % or use built-in function [wt, f] = cwt(x, fs, 'amor'); % 'amor' = analytic Morlet
Step 4: Plot the Scalogram
$|W_x(a,b)|^2$ — energy distribution over scale (or corresponding frequency) and time. The y-axis is typically on a logarithmic scale.

Concrete Example: ECG QRS Complex Detection

Parameter	Value	Rationale
Signal	ECG, $f_s = 360$ Hz	MIT-BIH database standard
Mother wavelet	Morlet ($\omega_0 = 6$)	Need both time localization and frequency analysis
Frequency range	10-40 Hz	Main energy band of the QRS complex
Scale range	$a \approx 8.6$~$34.4$	$a = f_\psi / (f \cdot \Delta t)$, $\Delta t = 1/360$
Result	Scalogram peaks in the 10-40 Hz band	Precisely correspond to the R-peak position of each heartbeat

Interactive: CWT Scalogram

Morlet CWT scalogram of a chirp signal. Note that in the low-frequency region, frequency resolution is high but time resolution is low (wide-thin tiles), while in the high-frequency region, time resolution is high but frequency resolution is low (narrow-thick tiles).

Adjust parameters above to see the plot

Application Scenarios

Seismology: CWT is the standard tool for analyzing seismic waves. Low-frequency surface waves (0.01-0.1 Hz) use large scales for precise frequency measurement (determining crustal thickness), while high-frequency body waves (1-20 Hz) use small scales for precise arrival time localization (locating the epicenter). Morlet CWT is widely used at the IRIS seismic data center
Financial time series: Morlet CWT is used to analyze multi-scale volatility of stock indices. Large scales (months to years) reveal business cycles, small scales (days to weeks) show short-term fluctuations. It can reveal how the dominant period changes across different time segments
Neuroscience (brainwave analysis): Analyzing event-related spectral perturbations (ERSP) in EEG. CWT automatically provides appropriate time/frequency resolution across delta (1-4 Hz), theta (4-8 Hz), alpha (8-12 Hz), beta (13-30 Hz), and gamma (30-100 Hz) bands
Mechanical fault transient detection: Periodic impact pulses from bearing damage (high-frequency transients) — CWT can precisely localize the time of each impact while analyzing its frequency characteristics (determining whether it's an inner ring, outer ring, or rolling element fault)

Pitfalls and Limitations

Scale-frequency correspondence is imprecise: Different mother wavelets have different $f_\psi$, and the spectral width of the mother wavelet means each scale corresponds to a frequency range, not a single frequency
Boundary effects — Cone of Influence (COI): At the beginning and end of the signal, large-scale wavelets "extend" beyond the signal boundary. Regions outside the COI are unreliable — typically marked as shaded regions on the scalogram
Higher computational cost than STFT: CWT computes over all scales and translations → $O(N \cdot N_{\text{scales}})$. With frequency-domain convolution acceleration → $O(N \log N \cdot N_{\text{scales}})$
Highly redundant: CWT produces far more coefficients than the original signal length (continuous $a$ and $b$), making it unsuitable for compression
Reconstruction requires the admissibility condition: If the mother wavelet does not strictly satisfy the admissibility condition (e.g., Morlet wavelet when $\omega_0 < 5$), reconstruction will have errors

When Not to Use CWT? Alternatives

Scenario	Problem	Alternative
Only need octave band decomposition (compression, denoising)	CWT is too redundant and slow	DWT ($O(N)$, no redundancy) → Section 5.5
Need fixed frequency resolution	CWT's frequency resolution varies with scale	STFT (fixed $\Delta f$) → Section 5.1
Need maximum time-frequency concentration	CWT is blurred due to mother wavelet width	SST (sharpened CWT) → Section 5.7
Nonlinear signals, don't want preset basis	Wavelets are still predefined basis functions	EMD / HHT → Section 5.6

References: [1] Morlet, J. et al., Wave Propagation and Sampling Theory, Geophysics, 47(2):203-236, 1982. [2] Grossmann, A. & Morlet, J., Decomposition of Hardy Functions into Square Integrable Wavelets of Constant Shape, SIAM J. Math. Anal., 15:723-736, 1984. [3] Daubechies, I., Ten Lectures on Wavelets, SIAM, 1992. [4] Torrence, C. & Compo, G.P., A Practical Guide to Wavelet Analysis, Bull. Amer. Meteorol. Soc., 1998.

✅ Quick Check

Q1: What is the fundamental difference between CWT and STFT?

Show answer

STFT has a fixed window width. CWT's window width adaptively adjusts with frequency: long windows for low frequencies (high frequency resolution), short windows for high frequencies (high time resolution).

Q2: Why does the mother wavelet need to satisfy the zero-mean condition?

Show answer

This is the admissibility condition, which ensures the CWT is invertible. Physical meaning: the wavelet must be oscillatory (with both positive and negative values), and cannot be purely positive or purely negative.

← 5.3 Cohen's Class 5.5 Discrete Wavelet Transform (DWT) →

5.5 Discrete Wavelet Transform (DWT)

Mallat's filter bank architecture — peeling frequency bands layer by layer like an onion

Why does this matter? Because CWT has high computational cost and high redundancy. DWT achieves the same multi-resolution analysis with O(N) computation, and is the industrial standard for JPEG 2000 image compression and ECG denoising.

Previously: CWT in Section 5.4 has high computational cost (continuous scales and translations). DWT uses filter banks to achieve an O(N) discrete version, and is the industrial standard for image compression and denoising.

Learning Objectives

Understand that DWT is equivalent to iterated two-channel filter banks (Mallat's algorithm)
Master the octave band decomposition structure in the frequency domain
Understand the intuitive meaning and practical impact of Vanishing Moments
Learn to select wavelet families and decomposition levels, and apply them to denoising and compression
Recognize the shift-invariance problem of DWT and its solutions

One-Sentence Summary

DWT uses a recursive set of lowpass + highpass filters to decompose the signal layer by layer — peeling apart different frequency bands like an onion. Each layer halves the frequency range and data size, so the entire computation requires only $O(N)$ time.

Pain Point: CWT Is Too Redundant and Too Slow

CWT computes over continuous scales $a$ and continuous translations $b$ → producing a large number of redundant coefficients (far more than the original signal data). This is fine for theoretical analysis, but causes three problems in engineering applications:

High computational cost: $O(N \cdot N_{\text{scales}} \cdot \log N)$, impractical for long signals
High redundancy: Not suitable for compression (the goal is to represent the signal with the fewest coefficients)
No orthogonality: Continuous-scale wavelets do not form an orthogonal basis, disadvantageous for mathematical analysis

Solution: Compute only at dyadic sampling scales $a = 2^j$ and translations $b = k \cdot 2^j$ → DWT.

Origin

Stephane Mallat (1989) made the key contribution connecting wavelet theory with engineering practice: he proved that DWT is equivalent to iterated two-channel filter banks, computable in $O(N)$ time. This algorithm, called Mallat's algorithm (or the Pyramid Algorithm), brought wavelets from theory to practice.

Ingrid Daubechies (1988) constructed orthogonal wavelet bases with compact support — meaning the filters have finite length and can be exactly implemented digitally. The Daubechies wavelet family she constructed (db1=Haar, db2, db3, ..., dbN) remains the most widely used wavelet family to this day.

Daubechies later received the Fudan-Zhongzhi Science Award in 2019 and the Wolf Prize in Mathematics in 2023, and is one of the most important founders of wavelet theory.

Principle: Mallat's Fast Wavelet Transform

Intuition: Each decomposition level passes the signal through two filters — a lowpass (retaining the low-frequency "rough outline") and a highpass (retaining the high-frequency "details"). The low-frequency part is then downsampled (since the bandwidth is halved, the sampling rate can be halved too), and the same process is repeated. Like peeling an onion: each layer reveals deeper structure.

Decomposition (Analysis)

$$c_{A}^{(j+1)}[k] = \sum_{n} h[n - 2k]\, c_{A}^{(j)}[n] \quad \text{(lowpass filtering + downsample by 2 → Approximation coefficients)}$$ $$c_{D}^{(j+1)}[k] = \sum_{n} g[n - 2k]\, c_{A}^{(j)}[n] \quad \text{(highpass filtering + downsample by 2 → Detail coefficients)}$$

$h[n]$: lowpass decomposition filter (corresponding to scaling function); $g[n]$: highpass decomposition filter (corresponding to wavelet function)

$g[n] = (-1)^n h[L-1-n]$ (QMF relationship)

Reconstruction (Synthesis)

$$c_{A}^{(j)}[n] = \sum_{k} \tilde{h}[n - 2k]\, c_{A}^{(j+1)}[k] + \sum_{k} \tilde{g}[n - 2k]\, c_{D}^{(j+1)}[k]$$

$\tilde{h}, \tilde{g}$: reconstruction filters. The Perfect Reconstruction (PR) condition guarantees $x = \text{IDWT}(\text{DWT}(x))$.

Octave Decomposition in the Frequency Domain

Assuming $f_s$ is the sampling rate, effective frequency range $[0, f_s/2]$:

Decomposition Level	Detail Coefficient Band	Approximation Coefficient Band	Number of Coefficients
Level 1	$[f_s/4,\; f_s/2]$	$[0,\; f_s/4]$	$N/2$ + $N/2$
Level 2	$[f_s/8,\; f_s/4]$	$[0,\; f_s/8]$	$N/4$ + $N/4$
Level 3	$[f_s/16,\; f_s/8]$	$[0,\; f_s/16]$	$N/8$ + $N/8$
Level $J$	$[f_s/2^{J+1},\; f_s/2^J]$	$[0,\; f_s/2^{J+1}]$	$N/2^J$ + $N/2^J$

Total number of coefficients = $N$ (no redundancy). Computation = $N + N/2 + N/4 + \cdots < 2N = O(N)$.

Expand: Perfect Reconstruction (PR) Condition Derivation

In the $z$-domain, the analysis-synthesis system output of the two-channel filter bank is:

$$\hat{X}(z) = \frac{1}{2}\left[\tilde{H}(z)H(z) + \tilde{G}(z)G(z)\right]X(z) + \frac{1}{2}\left[\tilde{H}(z)H(-z) + \tilde{G}(z)G(-z)\right]X(-z)$$

Perfect reconstruction requires:

Aliasing Cancellation: $\tilde{H}(z)H(-z) + \tilde{G}(z)G(-z) = 0$
No Distortion: $\tilde{H}(z)H(z) + \tilde{G}(z)G(z) = 2z^{-d}$ (perfect pass-through with delay $d$)

For orthogonal wavelets ($\tilde{H} = H$, $\tilde{G} = G$), we only need $H(z)$ to satisfy:

$$|H(e^{j\omega})|^2 + |H(e^{j(\omega+\pi)})|^2 = 2$$

This is the key equation Daubechies used to construct orthogonal wavelets. $\blacksquare$

Intuition of Vanishing Moments

A wavelet $\psi(t)$ having $p$ vanishing moments means:

$$\int t^k\, \psi(t)\, dt = 0 \quad \text{for } k = 0, 1, \ldots, p-1$$

Intuitive explanation: $p$ vanishing moments = the wavelet is "blind to" polynomials of degree $p-1$.

1 vanishing moment (Haar): Blind to constants (degree 0 polynomials) → only captures the "difference from a constant"
2 vanishing moments (db2): Blind to constants and linear trends → only captures the "difference from a straight line"
4 vanishing moments (db4): Blind to polynomials of degree $\leq 3$ → only captures higher-order details

Practical impact: More vanishing moments → smaller wavelet coefficients in smooth regions (approaching zero) → higher compression efficiency (more coefficients can be discarded). But the cost is: longer filters (db$p$ filter length = $2p$) → more computational delay and more severe boundary effects.

How to Use: Step-by-Step

Step 1: Choose a Wavelet Family

Wavelet	Filter Length	Vanishing Moments	Characteristics	Use Cases
Haar (db1)	2	1	Simplest, discontinuous	Binary signals, step detection
db4	8	4	Compact support, asymmetric	General-purpose first choice
sym8	16	8	Nearly symmetric	When near-linear phase is needed
coif3	18	6 (wavelet) + 6 (scaling)	Both wavelet and scaling function have vanishing moments	Numerical analysis, approximation
CDF 9/7	9+7 (biorthogonal)	4+4	Symmetric, floating-point	JPEG 2000 image compression

Step 2: Choose the Decomposition Level $J$
$J$ = how many octave bands you need. Maximum $J_{\max} = \lfloor\log_2(N/L)\rfloor$ ($L$ = filter length).

Rule of thumb: $J = \lfloor\log_2(N)\rfloor - 1$ or determined by frequency band requirements.
Step 3: Decompose
% MATLAB [C, L] = wavedec(x, J, 'db4'); % C = [cA_J | cD_J | cD_{J-1} | ... | cD_1] % L = coefficient length at each level # Python (PyWavelets) import pywt coeffs = pywt.wavedec(x, 'db4', level=J) # coeffs = [cA_J, cD_J, cD_{J-1}, ..., cD_1]
Step 4: Process Coefficients (Depending on Purpose)
- Denoising: Apply thresholding to detail coefficients
  - Soft Thresholding: $\hat{d} = \text{sign}(d) \cdot \max(|d| - \lambda, 0)$ — smooth, small bias
  - Hard Thresholding: $\hat{d} = d \cdot \mathbf{1}(|d| > \lambda)$ — preserves large coefficients, but discontinuous
  - Universal Threshold: $\lambda = \sigma \sqrt{2\ln N}$, where $\sigma$ is estimated from Level 1 detail coefficients via MAD: $\hat{\sigma} = \text{MAD}(cD_1) / 0.6745$
- Compression: Keep only the $K$ largest coefficients, set the rest to zero → reconstruct. Compression ratio = $N/K$
- Feature extraction: Compute energy at each level $E_j = \sum_k |cD_j[k]|^2$ as classification features
Step 5: Reconstruct
% MATLAB x_rec = waverec(C_modified, L, 'db4'); # Python x_rec = pywt.waverec(coeffs_modified, 'db4')

Concrete Example: ECG Denoising

Signal: ECG, $f_s = 360$ Hz, contaminated with 60 Hz power line interference and high-frequency EMG noise

Method: db4, 4-level decomposition

Level	Band (Hz)	Corresponding Component	Processing
cD1	90-180	High-frequency EMG noise	Soft threshold (nearly all removed)
cD2	45-90	60 Hz power line + some EMG	Soft threshold
cD3	22.5-45	Main energy of QRS complex	Preserve
cD4	11.25-22.5	P wave, T wave	Preserve
cA4	0-11.25	Baseline wander	Remove or preserve (depending on needs)

Result: The reconstructed ECG clearly preserves QRS, P, and T waveforms, with high-frequency noise and baseline wander effectively removed.

Application Scenarios

Image compression — JPEG 2000: Uses CDF 9/7 biorthogonal wavelets for 2D DWT on images, followed by quantization and entropy coding of wavelet coefficients. JPEG 2000 achieves 20-30% higher compression ratio than JPEG (which uses DCT) at the same quality. Smooth regions in images → wavelet coefficients approach zero (thanks to vanishing moments) → high compression ratio
ECG/EEG denoising: As in the example above. Soft thresholding + universal threshold criterion is the standard method in medical signal processing
Seismic data compression: Seismic exploration produces TB-scale data. DWT compression can reduce storage by 10-50x while preserving key geological structure features
Edge detection: Edges in images = abrupt brightness changes → manifest as large values in DWT high-frequency (detail) coefficients. Haar or db2 wavelets are particularly suitable for detecting step-like edges

Pitfalls and Limitations

Not shift-invariant: Shifting the same signal by one sample can completely change the DWT coefficients. The cause is the downsampling operation. This leads to: (a) denoising results may vary depending on the signal's starting point; (b) reconstructed signals may exhibit Gibbs-like ringing at edges
Solution — Stationary Wavelet Transform (SWT): No downsampling → shift-invariant, but high redundancy ($J \times N$ coefficients vs. DWT's $N$)
Choosing the wrong wavelet can introduce artifacts: For example, using the discontinuous Haar wavelet on smooth signals → blocky artifacts in reconstruction
Only octave band decomposition: Frequency resolution decreases exponentially with level. Level 1 covers $[f_s/4, f_s/2]$, Level 5 covers $[f_s/64, f_s/32]$ — no "intermediate" bandwidth options
Boundary handling: Signal boundaries need extension at each decomposition level (symmetric, periodic, or zero-padding), and different extension methods affect coefficients near boundaries

When Not to Use DWT? Alternatives

Scenario	Problem	Alternative
Need a continuous time-frequency representation	DWT only has discrete octave bands	CWT → Section 5.4
Need shift invariance	Downsampling destroys shift invariance	SWT (Stationary WT, no downsampling) or DTCWT (Dual-Tree CWT)
Need uniform frequency band splitting	Octave decomposition is not uniform	Wavelet Packet (can split bands arbitrarily)
Need data-driven decomposition	DWT basis is predefined	EMD → Section 5.6 or VMD

References: [1] Mallat, S., A Theory for Multiresolution Signal Decomposition: The Wavelet Representation, IEEE Trans. PAMI, 11(7):674-693, 1989. [2] Daubechies, I., Orthonormal Bases of Compactly Supported Wavelets, Comm. Pure Appl. Math., 41(7):909-996, 1988. [3] Donoho, D.L. & Johnstone, I.M., Ideal Spatial Adaptation by Wavelet Shrinkage, Biometrika, 81(3):425-455, 1994. [4] Mallat, S., A Wavelet Tour of Signal Processing, 3rd ed., Academic Press, 2008.

Interactive: Haar DWT Decomposition

Decompose the signal layer by layer into approximation (low-frequency) and detail (high-frequency) coefficients using the Haar wavelet.

Decomposition levels:

Adjust parameters above to see the plot

← 5.4 Continuous Wavelet Transform (CWT)5.6 EMD / HHT →

5.6 Empirical Mode Decomposition (EMD) / Hilbert-Huang Transform (HHT)

Data-driven adaptive decomposition — let the data tell you the basis

Why does this matter? Because Fourier and wavelet methods both use predefined basis functions, but some signals (ocean waves, seismic events, biological rhythms) don't look like sinusoids or known wavelets. EMD lets the data decide the basis — it is a unique tool for nonlinear, non-stationary signals.

Previously: All methods so far use predefined bases (sinusoids, wavelets). But what if the signal doesn't resemble any known basis? EMD lets the data decide the basis.

Learning Objectives

Understand every step of the EMD Sifting Process
Master the definition and physical meaning of IMF (Intrinsic Mode Function)
Understand how the Hilbert-Huang Transform obtains a time-frequency representation from IMFs
Recognize the Mode Mixing problem and the improvements of EEMD/CEEMDAN

One-Sentence Summary

EMD makes no assumptions about basis functions — it lets the data tell you what components to decompose into. No sinusoids, no wavelets — the decomposed components take whatever shape the signal has.

Pain Point: Predefined Bases Are Not Flexible Enough

Fourier analysis assumes the signal is composed of sinusoids; wavelet analysis uses a preselected mother wavelet as the basis. For nonlinear, non-stationary signals, these predefined basis functions may be fundamentally unsuitable:

Ocean waves: Real ocean waves are not sinusoidal — crests are sharp and troughs are flat (Stokes waves). Fourier analysis produces many spurious harmonics (2x, 3x frequency...) that do not represent real physical components
Biological signals: Heart rate variability (HRV) exhibits nonlinear, non-stationary characteristics. The LF/HF ratio in Fourier spectra is widely used, but its assumption (stationarity) is frequently violated
Mechanical faults: The impact response from bearing damage is modulated by a nonlinear spring-damper system — the waveform is neither sinusoidal nor resembles any standard wavelet

Fundamental question: Real-world signals don't "owe" us a sinusoidal decomposition. Why not let the data decide how to decompose?

Origin

Norden E. Huang et al. (1998) proposed EMD and HHT at NASA Goddard Space Flight Center. This paper, published in Proceedings of the Royal Society A, is one of the most cited papers in signal processing over the past 25 years (over 15,000 citations).

Inspiration: Huang was an oceanographer, and his core motivation came from ocean wave analysis — ocean waves are highly nonlinear and non-stationary. The harmonics produced by Fourier analysis are mathematical artifacts that do not represent real physical processes. He wanted a method whose decomposed components directly reflect physical phenomena.

Huang once said: "EMD is like a surgeon's scalpel — you don't need it for healthy tissue, but when you need it, nothing else will do."

Principle: Sifting Process

Intuition: Like panning for gold — place the signal on a sieve and shake, the topmost oscillations (highest-frequency components) are sifted out first, then the next layer, and so on until only the slow trend remains.

Complete Steps of Sifting

Find local extrema: Identify all local maxima and local minima of the signal $x(t)$
Construct upper envelope: Apply cubic spline interpolation to all local maxima → obtain the upper envelope $e_{\max}(t)$
Construct lower envelope: Apply cubic spline interpolation to all local minima → obtain the lower envelope $e_{\min}(t)$
Compute mean envelope: $m(t) = \frac{e_{\max}(t) + e_{\min}(t)}{2}$
Remove trend: $h(t) = x(t) - m(t)$
Check IMF conditions:
- Condition 1: The difference between the number of extrema (maxima + minima) and zero crossings is $\leq 1$
- Condition 2: The mean envelope is approximately zero at all times
If satisfied: $h(t)$ is the first IMF → $\text{IMF}_1(t) = h(t)$
If not satisfied: Replace $x(t)$ with $h(t)$, return to Step 1 and continue sifting
Remove the first IMF: $r_1(t) = x(t) - \text{IMF}_1(t)$ (residual)
Use the residual as new input: $x(t) \leftarrow r_1(t)$, return to Step 1 to extract $\text{IMF}_2$
Repeat until the residual $r_K(t)$ is monotonic (no more oscillations) or has $\leq 1$ extremum

EMD Decomposition Result

$$x(t) = \sum_{i=1}^{K} \text{IMF}_i(t) + r_K(t)$$

$\text{IMF}_i$: the $i$-th Intrinsic Mode Function (ordered from high to low frequency); $r_K$: final residual (trend)

Intuition of IMF: Each IMF can be viewed as a "nearly narrowband" oscillatory component of the signal. Its amplitude and frequency can both slowly vary over time (something Fourier analysis cannot handle), but at each moment there is only one dominant frequency.

Expand: Details of Sifting Stopping Criteria

When does the sifting iteration stop (when do we declare $h$ as an IMF)? Common criteria include:

Cauchy-type criterion (Huang 1998):

$$\text{SD} = \frac{\sum_t |h_{k-1}(t) - h_k(t)|^2}{\sum_t h_{k-1}^2(t)} < \epsilon$$

$\epsilon$ typically set to 0.2~0.3. Drawback: purely based on numerical convergence, may lead to over-sifting.

Rilling's three criteria (2003):

$|m(t)/a(t)| < \theta_1$ holds at least $(1-\alpha)$ fraction of time points ($a(t) = (e_{\max} - e_{\min})/2$ is the local amplitude)
$|m(t)/a(t)| < \theta_2$ holds at all time points
Typical values: $\theta_1 = 0.05$, $\theta_2 = 0.5$, $\alpha = 0.05$

This criterion is more robust, avoiding over-sifting that distorts the IMF.

Hilbert-Huang Transform (HHT)

After obtaining the IMFs, apply the Hilbert transform to each IMF to extract instantaneous frequency and amplitude:

Apply the Hilbert transform to $\text{IMF}_i(t)$ → obtain the analytic signal $z_i(t) = \text{IMF}_i(t) + j\,\hat{H}[\text{IMF}_i](t)$
Instantaneous amplitude: $a_i(t) = |z_i(t)|$
Instantaneous phase: $\phi_i(t) = \arg[z_i(t)]$
Instantaneous frequency: $\omega_i(t) = d\phi_i/dt$

Hilbert Spectrum

$$H(\omega, t) = \sum_{i=1}^{K} a_i(t)\, \delta(\omega - \omega_i(t))$$

Each IMF contributes an energy value at each time instant to the position of its instantaneous frequency → time-frequency representation

Unlike STFT/CWT, the frequencies in the Hilbert spectrum are not on a fixed grid but are continuously varying curves over time. This is a more natural representation for non-stationary signals.

How to Use

% MATLAB conceptual code % Step 1: EMD decomposition imf = emd(x); % returns IMF matrix (one IMF per row) % Step 2: Apply Hilbert transform to each IMF [hs, f_inst, t_inst] = hht(imf, fs); % Step 3: Plot Hilbert spectrum hht(imf, fs); % direct plotting # Python (emd-signal package) import emd imfs = emd.sift.sift(x) IP, IF, IA = emd.spectra.frequency_transform(imfs, fs, 'hilbert') hht = emd.spectra.hilbert_huang(IF, IA, freq_range=(0, fs/2))

Concrete Example: Ocean Wave Data Analysis

Signal: 20-minute wave height time series from an ocean site, $f_s = 2$ Hz (sampling interval 0.5 seconds)

EMD result: Decomposed into 8 IMFs + residual

IMF	Typical Period	Physical Correspondence
IMF 1-2	1-3 seconds	Wind waves — short waves generated by local wind
IMF 3	3-5 seconds	Transition zone — wind waves transitioning to swell
IMF 4-6	5-15 seconds	Swell — long waves from distant storms
IMF 7-8	30 seconds to minutes	Long-period waves / infragravity waves
Residual $r$	$> 10$ minutes	Tidal and mean water level changes

Advantage: EMD automatically separates waves of different physical origins without needing predefined frequency boundaries. Each IMF's waveform reflects the actual wave shape (non-sinusoidal), and the Hilbert spectrum reveals time-varying characteristics of instantaneous frequency and amplitude.

Application Scenarios

Ocean engineering (wave analysis): The original application domain of EMD. Analyzing typhoon waves, rogue waves, and wave-structure interactions. Recommended by the IEEE Oceanic Engineering Society for nonlinear wave analysis
Structural health monitoring: The vibration response of bridges and buildings under earthquakes or strong winds is nonlinear and non-stationary. EMD can separate different vibrational modes and track natural frequency changes as damage progresses
Biomedical signals (heart rate variability analysis): In HRV analysis, EMD can extract respiration-related components (high-frequency IMFs) and blood pressure regulation components (low-frequency IMFs) without assuming stationarity
Financial time series: Trend extraction from stock indices. Low-frequency IMFs + residual = long-term trend; high-frequency IMFs = short-term fluctuations and noise. More adaptive than moving averages

Pitfalls and Limitations

Mode Mixing: The most serious problem. Significantly different frequency components mix into a single IMF, or the same physical component is split across multiple IMFs.
Cause: Intermittent signals (a frequency component that appears and disappears) cause problems with envelope interpolation during the sifting process.

Solutions:
- EEMD (Ensemble EMD, Wu & Huang 2009): Add white noise → EMD → average multiple results. The noise helps "break" the mixing. Typical settings: noise amplitude = 0.1-0.2 times standard deviation, ensemble size = 100-500
- CEEMDAN (Complete EEMD with Adaptive Noise, Torres et al. 2011): Add noise to the residual at each step rather than to the original signal → cleaner decomposition, less residual noise
No rigorous mathematical theory: Unlike Fourier which has completeness and Parseval's theorem, and unlike wavelets which have frame theory. The convergence, uniqueness, and stability of EMD have no rigorous mathematical proofs
End effects: Cubic spline interpolation at signal boundaries requires extrapolation → envelopes may diverge or become unreasonable. Common solutions: mirror extension, extrapolating extrema points
Non-unique results: Different sifting stopping criteria, different envelope interpolation methods, different boundary handling → results may differ. This is problematic for scientific research requiring reproducibility
Computational cost: Each sifting iteration requires finding extrema, interpolation, and subtraction → overall computational cost is not small, especially EEMD which requires hundreds of repetitions

When Not to Use EMD? Alternatives

Scenario	Problem	Alternative
Need rigorous mathematical framework and reproducibility	EMD lacks theoretical guarantees	CWT or DWT (have complete mathematical theory)
Need stable, deterministic decomposition	EMD results depend on parameters	VMD (Variational Mode Decomposition) — replaces sifting with optimization, producing unique and stable results
Signal is linear and stationary	EMD's advantages are not significant, and it is more complex	FFT / Welch (simple and direct)
Need efficient real-time processing	EMD (especially EEMD) is too slow	STFT or DWT

VMD (Variational Mode Decomposition, Dragomiretskiy & Zosso 2014): Reformulates mode decomposition as a constrained variational optimization problem. Advantages: unique results, not affected by initialization, can specify the number of modes. Disadvantage: requires a preset mode count $K$ (but can be automatically selected using residual criteria). In many applications, VMD is replacing EMD.

References: [1] Huang, N.E. et al., The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-Stationary Time Series Analysis, Proc. R. Soc. Lond. A, 454:903-995, 1998. [2] Wu, Z. & Huang, N.E., Ensemble Empirical Mode Decomposition, Adv. Adaptive Data Analysis, 1(1):1-41, 2009. [3] Torres, M.E. et al., A Complete Ensemble Empirical Mode Decomposition with Adaptive Noise, IEEE ICASSP, 2011. [4] Dragomiretskiy, K. & Zosso, D., Variational Mode Decomposition, IEEE Trans. Signal Process., 62(3):531-544, 2014.

Interactive: EMD Sifting Process Animation

Observe the EMD process of progressively "sifting" out each IMF: find extrema → envelopes → mean → subtract → converge.

Press "Start Decomposition" to initialize the signal.

← 5.5 Discrete Wavelet Transform (DWT)5.7 Synchrosqueezing →

5.7 Synchrosqueezing Transform (SST)

Sharpening CWT — combining wavelet robustness with near-WVD resolution

Why does this matter? Because CWT time-frequency plots are too blurry and WVD has cross-terms. SST combines the advantages of both — the stability of wavelets plus near-WVD sharpness. It is one of the most important advances in time-frequency analysis in the past decade.

Previously: EMD in Section 5.6 lacks mathematical theory and produces non-unique results. SST combines the stability of CWT with the sharpness of WVD, and is an important advance of the past decade.

Learning Objectives

Understand the core idea of SST: using instantaneous frequency estimates to "squeeze" CWT energy to the correct location
Recognize the historical context of Frequency Reassignment and SST's improvement
Compare the resolution and cross-term characteristics of SST vs. CWT vs. WVD
Understand SST's sensitivity to noise and the 2nd-order SST extension

One-Sentence Summary

SST "squeezes" the blurry CWT time-frequency plot into sharp lines — maintaining the robustness and cross-term-free nature of wavelets while achieving near-WVD time-frequency concentration. Like refocusing a blurry photograph.

Pain Point: CWT Is Blurry, WVD Has Ghost Artifacts

So far, we face a trilemma:

Method	Resolution	Cross-Terms	Robustness
STFT / Spectrogram	Heisenberg limited ✗	None ✓	High ✓
WVD	Perfect ✓	Severe ✗	Low ✗
CWT Scalogram	Multi-resolution (but still blurry)	None ✓	High ✓

Is there a way to get the best of both worlds — no cross-terms like CWT, yet sharp like WVD? SST gives a near-perfect answer.

Origin

Frequency Reassignment (Auger & Flandrin, 1995): The earliest "sharpening" idea. They observed that STFT/CWT energy is "smeared" across the width of the window/wavelet, but the energy can be "relocated" to the correct position using the instantaneous frequency of each coefficient. The problem: the result after reassignment cannot be inverted (not invertible).

Ingrid Daubechies, Jianfeng Lu & Hau-Tieng Wu (2011): Proposed SST in Applied and Computational Harmonic Analysis. Their key contribution was designing an invertible reassignment method — squeezing only in the frequency direction (scale direction), leaving the time direction unchanged. This guarantees invertibility while obtaining a sharp time-frequency representation.

Daubechies made foundational contributions to both wavelet theory (CWT/DWT, Sections 5.4-5.5) and SST — she is one of the most influential scholars in the field of time-frequency analysis.

Principle

Intuition: Imagine the CWT scalogram as a blurry photo — energy spreads like watercolor across a range of scales. What SST does is: for each coefficient, ask "what is your true frequency?" (using instantaneous frequency estimation), then "squeeze" the energy from the current scale to the corresponding true frequency. Like pushing each drop of watercolor back to where it should be.

Step 1: Compute CWT

$$W_x(a, b) = \frac{1}{\sqrt{a}}\int x(t)\, \psi^*\!\left(\frac{t-b}{a}\right)\, dt$$

Step 2: Estimate the Instantaneous Frequency for Each Coefficient

$$\omega(a, b) = -\text{Im}\left[\frac{\partial_b W_x(a, b)}{W_x(a, b)}\right] = -\text{Im}\left[\frac{W_x^{(\psi')}(a, b)}{W_x(a, b)}\right]$$

$W_x^{(\psi')}$: CWT computed using $\psi'(t)$ (the derivative of the mother wavelet)

Intuition: $\omega(a,b)$ is the "true frequency" perceived by the CWT coefficient at scale $a$ and time $b$. If the signal's true frequency at this location is $\omega_0$, then regardless of what value $a$ takes (as long as $W_x(a,b)$ is not too small), $\omega(a,b) \approx \omega_0$.

Step 3: Synchrosqueezing

Redistribute CWT energy from scale $a$ to instantaneous frequency $\omega(a,b)$:

$$T_x(\omega_l, b) = \sum_{\substack{a_k:\; |\omega(a_k, b) - \omega_l| < \Delta\omega/2}} W_x(a_k, b)\, a_k^{-3/2}\, \Delta a_k$$

For each frequency bin $\omega_l$, sum up CWT coefficients from all scales whose instantaneous frequency points to it

Expand: Why Is SST Invertible While Reassignment Is Not?

Frequency Reassignment (Auger-Flandrin): Redistributes energy simultaneously in both time and frequency directions. This is a many-to-one mapping (multiple time-frequency points may map to the same point) → not invertible.

SST: Redistributes only in the frequency (scale) direction, keeping the time position $b$ unchanged. Therefore, at each fixed $b$, the squeezing operation is one-dimensional.

Invertibility Theorem (Daubechies et al. 2011): For a signal composed of $K$ components with separated instantaneous frequencies, the SST result $T_x(\omega, b)$ can be uniquely inverted back to the original signal:

$$x(t) = \text{Re}\left[\frac{1}{C_\psi}\int T_x(\omega, t)\, d\omega\right]$$

The prerequisite is that the instantaneous frequencies of components do not overlap at any time (separation condition). $\blacksquare$

Expand: SST Effect Analysis for a Single-Component Chirp

Consider $x(t) = A(t)\, e^{j\phi(t)}$ with instantaneous frequency $\omega_0(t) = \phi'(t)$.

CWT response at scale $a$: $W_x(a, b) \approx A(b)\sqrt{a}\, \hat{\Psi}^*(a\omega_0(b))\, e^{j\phi(b)}$

Instantaneous frequency estimate: $\omega(a, b) = \omega_0(b)$ (independent of $a$!)

Therefore SST squeezes energy from all scales to $\omega_0(b)$ → forming a sharp line on the frequency axis, perfectly tracking the instantaneous frequency.

CWT scalogram: Energy spreads in a band-like region around $\omega_0(b)$ (due to the mother wavelet's width).

SST: Energy concentrates into a line at $\omega_0(b)$.

The sharpening degree is equivalent to WVD, but without cross-terms. $\blacksquare$

SST vs CWT vs WVD: Visual Comparison

Property	CWT Scalogram	WVD	SST
Single-component appearance	Blurry band	Sharp line	Sharp line
Multi-component cross-terms	None	Severe	None
Invertibility	Invertible	Invertible	Invertible (under separation condition)
Can take negative values	No ($\|W\|^2 \geq 0$)	Yes	Yes (complex-valued)
Computation	$O(N \cdot J \log N)$	$O(N^2)$	$O(N \cdot J \log N)$ (same order as CWT)

How to Use

Choose a mother wavelet: Typically Morlet (analytic wavelet), because SST requires precise instantaneous frequency estimation, and Morlet's narrowband characteristics are most suitable
Compute CWT: Same as Section 5.4
Compute instantaneous frequency map: $\omega(a,b)$ — requires computing both $W_x$ and $W_x^{(\psi')}$ simultaneously
Perform Synchrosqueezing: Redistribute CWT coefficients to the corresponding frequency bins
Visualize: Plot $|T_x(\omega, b)|^2$

% MATLAB (Signal Processing Toolbox R2016b+) [sst_result, f] = wsst(x, fs, 'amor'); % 'amor' = analytic Morlet % sst_result: frequency x time complex matrix % f: frequency vector pcolor(t, f, abs(sst_result)); shading interp; colormap(jet); # Python (ssqueezepy package) import ssqueezepy Tx, Wx, ssq_freqs, scales = ssqueezepy.ssq_cwt(x, wavelet='morlet') plt.pcolormesh(t, ssq_freqs, np.abs(Tx), shading='auto')

Application Scenarios

Component separation of multi-component non-stationary signals: For example, two chirp signals crossing in the time-frequency plane — STFT sees a blurry blob, WVD sees cross-term interference, but SST can see two sharp crossing lines. Combined with Ridge Extraction algorithms, each component can be automatically separated and inverted for reconstruction
Geology (seismic wave mode separation): Different modes of seismic surface waves (fundamental mode, higher-order modes) appear as different $f$-$v$ relationships on dispersion curves. SST's sharp time-frequency plot makes mode separation more precise, aiding in subsurface structure inversion
Heart rate variability (HRV) analysis: SST can precisely track the time-varying characteristics of breathing frequency (which varies with activity, sleep stages, etc.), localizing instantaneous respiratory frequency more sharply than CWT
Mechanical fault diagnosis: Gear mesh frequency tracking under variable-speed conditions. SST sharpens the blurry frequency trajectories in CWT into precise lines, facilitating RPM computation and anomaly detection

Pitfalls and Limitations

Sensitive to noise: The instantaneous frequency estimate $\omega(a,b) = -\text{Im}[\partial_b W / W]$ becomes unstable in regions where $|W|$ is small (low SNR). Noise causes $\omega$ estimates to jump erratically → SST results show scattered speckles. In practice, an energy threshold must be set to ignore regions where $|W|$ is too small
Not suitable for rapid frequency jumps: SST assumes instantaneous frequency varies slowly locally. For sudden frequency jumps (e.g., FSK modulation), the sharpening effect of SST degrades at the jump points
2nd-order SST (Oberlin et al. 2015): An improved version. Uses second-order instantaneous frequency estimation (accounting for the rate of change of $\omega$) → better results for chirps and other signals with rapidly changing frequencies. But computational cost is higher and implementation is more complex
Separation condition: When components' instantaneous frequencies coincide at certain times, SST cannot separate them (unlike WVD — which theoretically can, but with cross-terms)
Slightly higher computation than CWT: Requires additionally computing $W_x^{(\psi')}$ (one CWT with the derivative wavelet) and instantaneous frequency mapping. Approximately 2-3 times that of CWT

When Not to Use SST? Alternatives

Scenario	Problem	Alternative
Only need a rough time-frequency view	SST computation is much larger than STFT	STFT Spectrogram (much faster, usually sufficient)
Very low SNR (< 0 dB)	Instantaneous frequency estimation is unstable	CWT Scalogram (blurry but stable) or Multitaper Spectrogram
Sudden frequency jumps	1st-order SST assumptions don't hold	2nd-order SST or STFT (short windows better track rapid changes)
Nonlinear signals, don't want preset wavelets	SST still depends on mother wavelet choice	EMD / HHT → Section 5.6
Need precise energy distribution	SST results are complex-valued, not positive energy	Reassigned Spectrogram (Auger-Flandrin)

Part V: Time-Frequency Analysis Methods Overview

Positioning and use cases for the seven methods:

Method	Core Property	Best Suited For	Biggest Limitation
5.1 STFT	Fixed window, fast computation	First choice for general time-frequency analysis	Fixed resolution
5.2 WVD	Highest resolution	Precise single-component analysis	Multi-component cross-terms
5.3 Cohen's	Unified framework, tunable kernel	Theoretical analysis and comparison	High computational cost
5.4 CWT	Multi-resolution	Multi-scale time-frequency analysis	Redundant, blurry
5.5 DWT	$O(N)$, no redundancy	Compression, denoising, feature extraction	Only octave bands
5.6 EMD	Data-driven, adaptive	Nonlinear non-stationary signals	Lacks theory, mode mixing
5.7 SST	Sharpened CWT, no cross-terms	Multi-component non-stationary separation	Noise-sensitive

Selection guide: Start with STFT for quick observation → if multi-resolution is needed, use CWT → if sharper results are needed, use SST → if cross-terms are a problem, avoid WVD → if the signal is highly nonlinear, try EMD → if compression/denoising is needed, use DWT.

References: [1] Daubechies, I., Lu, J. & Wu, H.-T., Synchrosqueezed Wavelet Transforms: An Empirical Mode Decomposition-like Tool, Appl. Comp. Harm. Anal., 30(2):243-261, 2011. [2] Auger, F. & Flandrin, P., Improving the Readability of Time-Frequency and Time-Scale Representations by the Reassignment Method, IEEE Trans. Signal Process., 43(5):1068-1089, 1995. [3] Oberlin, T., Meignen, S. & Perrier, V., Second-Order Synchrosqueezing Transform or Invertible Reassignment?, IEEE Trans. Signal Process., 63(5):1335-1344, 2015. [4] Thakur, G. et al., The Synchrosqueezing Algorithm for Time-Varying Spectral Analysis, Signal Processing, 93(5):1079-1094, 2013.

← 5.6 EMD / HHT 6.1 Advanced FFT →

6.1 Advanced FFT Algorithms

Learning Objectives

Distinguish the applicable scenarios for Split-Radix, Bluestein, and Goertzel
Determine when Goertzel is more efficient than FFT and when it is not
Use the Bluestein identity to understand how arbitrary-length DFT can be converted into convolution

Why does this matter? Because real-world data lengths are not always powers of 2, and sometimes you only need the result for a single frequency — computing a full FFT would be wasteful.

Previously:Part V established the theory of time-frequency analysis. Part VI returns to engineering practice — starting with advanced FFT implementation techniques.

Pain Point: What do you do when the data length is not a power of 2? When you only need to detect a single specific frequency (e.g., a DTMF dial tone), computing a full FFT is too wasteful.

Historical Context:

Goertzel Algorithm — Gerald Goertzel, 1958. Designed to compute only a single frequency bin of the DFT.
Bluestein Algorithm (Chirp-Z Transform) — Leo Bluestein, 1970. Cleverly converts an arbitrary-length DFT into a convolution problem.
Split-Radix FFT — Duhamel & Hollmann, 1984. Combines the advantages of Radix-2 and Radix-4, saving approximately 20% of complex multiplications.

Principles

Intuition: The Radix-2 FFT is a "general-purpose tool," but more efficient alternatives exist for special scenarios:

Split-Radix: Simultaneously uses a length-N/2 DFT (even part) and two length-N/4 DFTs (odd part), requiring approximately 20% fewer multiplications than pure Radix-2.
Bluestein: Works for any N, relying on an algebraic identity to convert the DFT into convolution.
Goertzel: Computes only a single bin using a second-order recursion (IIR), with computational cost O(N).

Formulas:

Bluestein Identity:

kn = \tfrac{1}{2}\bigl[k^2 + n^2 - (k-n)^2\bigr]

Substituting into the DFT twiddle factor $W_N^{kn}$:

$$X[k] = W_N^{k^2/2} \sum_{n=0}^{N-1} \bigl(x[n]\,W_N^{n^2/2}\bigr)\,W_N^{-(k-n)^2/2}$$

Inside the parentheses, $x[n]$ is multiplied by a known sequence (chirp), and the outer operation is a convolution with another known sequence. Three FFTs (zero-padded to the next power of 2) can complete an arbitrary-length N DFT.

Goertzel Recursion:

s[n] = x[n] + 2\cos(2\pi k/N)\,s[n-1] - s[n-2], \quad s[-1]=s[-2]=0

X[k] = s[N-1] - W_N^k\,s[N-2]

Full Derivation: Bluestein Identity

DFT definition: $X[k] = \sum_{n=0}^{N-1} x[n]\,W_N^{kn}$, where $W_N = e^{-j2\pi/N}$.

Using the identity $kn = \tfrac{1}{2}[k^2 + n^2 - (k-n)^2]$ (verify by expanding the right side):

$$W_N^{kn} = W_N^{k^2/2}\,W_N^{n^2/2}\,W_N^{-(k-n)^2/2}$$

Substituting into the DFT:

$$X[k] = W_N^{k^2/2} \sum_{n=0}^{N-1} \bigl[x[n]\,W_N^{n^2/2}\bigr]\,W_N^{-(k-n)^2/2}$$

Let $a[n] = x[n]\,W_N^{n^2/2}$ and $b[n] = W_N^{-n^2/2}$. The summation term is the value of the convolution of $a$ and $b$ at point $k$. Zero-pad $a$ and $b$ to length $\geq 2N-1$ (next power of 2), and three FFTs can complete the convolution.

Full Derivation: Goertzel Algorithm

DFT bin k: $X[k] = \sum_{n=0}^{N-1} x[n]\,W_N^{kn}$.

Observe that $W_N^{-kN} = 1$, so:

$$X[k] = W_N^{-kN}\sum_{n=0}^{N-1} x[n]\,W_N^{kn} = \sum_{n=0}^{N-1} x[n]\,W_N^{-k(N-n)}$$

This is equivalent to passing $x[n]$ through a filter with system function $H(z)=\frac{1}{1-W_N^{-k}z^{-1}}$ and taking the output at $n=N$.

However, this is a first-order complex recursion. To avoid complex multiplications, multiply both numerator and denominator by $(1-W_N^{k}z^{-1})$:

$$H(z) = \frac{1 - W_N^{k}z^{-1}}{1 - 2\cos(2\pi k/N)z^{-1} + z^{-2}}$$

The denominator is a second-order recursion with real coefficients (requiring only real multiplications). After running N steps, the numerator (one complex multiplication) yields $X[k]$.

Total: N real multiplications + 1 complex multiplication = O(N) per bin.

How to Use

Algorithm Selection Decision Tree:

N is a power of 2 → Use Radix-2 or Split-Radix FFT directly.
N is not a power of 2 → Zero-pad to the next power of 2 (simplest), or use Bluestein (when an exact N is strictly required).
Only K bins needed (K << N) → Use K Goertzel computations. More efficient than a full FFT when $K < \log_2 N$.

Concrete Example: DTMF Dial Tone Detection

DTMF (Dual-Tone Multi-Frequency) only needs to detect 8 frequencies: 697, 770, 852, 941, 1209, 1336, 1477, 1633 Hz.

// Sample rate 8000Hz, 205 samples per segment (~25.6ms)
// Full FFT (N=256): 256 × log₂(256) = 256 × 8 = 2048 complex multiplications
// 8 Goertzel runs:   8 × 205 = 1640 real multiplications + 8 complex multiplications
// Goertzel saves ~90% of computation (accounting for complex vs real differences)

for each freq in [697, 770, 852, 941, 1209, 1336, 1477, 1633]:
    k = round(freq * N / fs)
    coeff = 2 * cos(2 * pi * k / N)
    s1, s2 = 0, 0
    for sample in segment:
        s0 = sample + coeff * s1 - s2
        s2 = s1
        s1 = s0
    power = s1*s1 + s2*s2 - coeff*s1*s2
    if power > threshold:
        tone_detected(freq)

Application Scenarios

DTMF telephone dial tone detection: Telephone switches use Goertzel to detect 8 frequencies. 205 samples per segment @8kHz, processing ~39 segments per second.
Power quality measurement: Smart meters only need to measure the 50Hz fundamental and a few harmonics (100, 150, 200... Hz). Goertzel saves >10x power compared to a full FFT, extending battery life.
Non-standard-length data: Some communication protocol frame lengths are not powers of 2 (e.g., LTE SC-FDMA uses multiples of 12). Bluestein/CZT can handle these directly.

Pitfalls and Limitations:

Zero-padding to the next power of 2 is usually the simplest and most effective approach. Unless there are strict constraints on N (memory, latency), Bluestein is unnecessary.
Goertzel's efficiency advantage disappears when $K > \log_2 N$; in that case, a full FFT is faster.
Split-Radix is more complex to implement than Radix-2. Modern CPUs have SIMD-optimized FFT libraries (FFTW, MKL), making manual implementations difficult to beat.

When Not to Use:

If you can freely choose the data length, simply use a power of 2 — no need for Bluestein.
If you need the full spectrum (all N bins), Goertzel offers no advantage.
Alternative: Modern FFT libraries (FFTW) have built-in Mixed-Radix algorithms that efficiently handle any length (as long as N has small prime factors). Bluestein is truly needed only when N is prime.

✅ Quick Check

Q1: DTMF detection needs only 8 frequencies. Compared to a full FFT (N=256), is Goertzel cheaper? By how much?

Show answer

Goertzel: 8×N=2048 operations. FFT: (N/2)log₂N=1024 operations. In this case, FFT is actually faster! Goertzel only saves when K < log₂N.

Q2: Data length N=300 (not a power of 2) — what is the simplest approach?

Show answer

Zero-pad to 512 (the next power of 2) and use a standard Radix-2 FFT. Much simpler than the Bluestein algorithm.

Interactive: Goertzel vs FFT Efficiency Comparison

Compare the computational cost of Goertzel (computing only K frequencies) versus a full FFT. Goertzel is more efficient when K < log₂N; otherwise FFT is more efficient.

Number of frequencies to compute K: 8

Data length N: 256

← 5.7 SST 6.2 Numerical Precision →

6.2 Numerical Precision

Learning Objectives

Compute the SQNR of floating-point FFT and its relationship to N and precision
Explain overflow issues in fixed-point FFT and two scaling strategies
Choose floating-point/fixed-point precision based on SQNR requirements

Why does this matter? Because when implementing FFT on embedded systems and FPGAs, numerical precision directly determines system performance — if you do not understand fixed-point overflow and rounding errors, your spectral results will be garbage.

Previously:6.1 introduced different FFT algorithms. But on real hardware (FPGAs, embedded systems), floating-point numbers are not true real numbers — numerical precision is an engineering problem you must face.

Pain Point: Running a 16-bit fixed-point FFT on an FPGA yields only 60dB SNR (Signal-to-Noise Ratio), but the specification requires 80dB — why is there such a large gap?

Historical Context: Peter Welch systematically analyzed finite wordlength effects in FFT in 1969, establishing the theoretical framework for floating-point and fixed-point FFT errors. Subsequently, Oppenheim & Weinstein and others further refined scaling strategies for fixed-point FFT.

Principles

Intuition: FFT is a cascade of butterfly operations. Each butterfly involves multiplication and addition, and each step introduces rounding error. $\log_2 N$ butterfly stages = $\log_2 N$ rounds of error accumulation.

Floating-point FFT error bound:

$$\|X_{\text{computed}} - X_{\text{exact}}\|_{\text{rms}} \;\leq\; C\,\varepsilon_m\,\sqrt{\log_2 N}\;\|x\|_{\text{rms}}$$

where $\varepsilon_m$ is the machine epsilon: $\approx 1.19 \times 10^{-7}$ for float32, $\approx 2.22 \times 10^{-16}$ for float64.

The problem with fixed-point FFT — Overflow:

The butterfly operation $a + b$ may exceed the representable range of fixed-point numbers. Two solution strategies:

Right-shift 1 bit per stage (Convergent Scaling): After each butterfly stage, all values are right-shifted by 1 bit (divided by 2). This guarantees no overflow, but loses 1 bit of precision per stage. After $\log_2 N$ stages, $\log_2 N$ bits are lost.
Block Floating Point (BFP): Check the maximum value at each stage and scale only when necessary. Average loss is less than per-stage right-shifting.

Concrete numbers:

Configuration	N	SQNR (dB)	Notes
Float64	1024	≈ 281	Sufficient for virtually any application
Float32	1024	≈ 131	Sufficient for most consumer-grade applications
Fixed 32-bit	1024	≈ 132	Mainstream FPGA choice
Fixed 16-bit (per-stage scaling)	1024	≈ 72	10 butterfly stages lose 10 bits, leaving 6 effective bits
Fixed 16-bit (BFP)	1024	≈ 84	~12dB better than fixed scaling

Derivation: SQNR of Fixed-Point FFT with Per-Stage Scaling

Quantization noise power for 16-bit fixed-point: $\sigma_q^2 = 2^{-2B}/12$; for B=16, $\sigma_q^2 \approx 3.2 \times 10^{-11}$.

Right-shifting by 1 bit per stage is equivalent to introducing one quantization. With $\log_2 N = 10$ stages, each stage introduces one independent quantization noise.

Total quantization noise power $\approx 10 \times \sigma_q^2$ (but note that scaling at each stage progressively reduces the effective bit width).

More precisely: after each right-shift, the effective bits decrease from B to B-1, ultimately leaving B - log₂N = 16 - 10 = 6 effective bits.

SQNR ≈ 6.02 × 6 + 1.76 + 10log₁₀(N/2) ≈ 36.1 + 1.76 + 27.1 ≈ 65-72 dB (depending on signal statistics).

How to Use

Precision Selection Guide:

Float64 (double): Scientific computing, offline analysis with no precision concerns. SQNR > 280dB.
Float32 (single): GPU acceleration, most consumer electronics, sufficient for N ≤ 2²⁰. SQNR ≈ 130dB.
Fixed 32-bit: FPGA radar/communications, equivalent to float32 in most cases.
Fixed 16-bit: Low-power embedded systems. Must use BFP or per-stage scaling. SQNR 70-85dB.

Concrete Example: FPGA Radar FFT Design

// Specification: 1024-point FFT, 14-bit ADC input, SQNR > 75dB required
// Solution: Use 18-bit internal word width + Block Floating Point

// Check maximum value after each butterfly stage:
if (max_value > 0.5 * FULL_SCALE):
    right_shift_all_by_1()
    block_exponent += 1

// 10 butterfly stages, on average only 5-6 scalings needed (depends on signal)
// Effective bits: 18 - 6 = 12 bit → SQNR ≈ 6.02×12 + 1.76 ≈ 74 dB
// Conclusion: 18-bit BFP barely meets spec; recommend 20-bit for margin

Application Scenarios

FPGA radar processor: Xilinx/AMD FPGA DSP48 slices natively support 18×25 bit multiplication. 1024-point FFT using 18-bit BFP, throughput 500MHz.
Embedded audio DSP: Audio DSPs inside mobile phone chips (e.g., Qualcomm Hexagon) typically use 16-bit fixed-point processing. 256-point FFT with per-stage scaling, SQNR ≈ 80dB, sufficient for audio (human ear dynamic range ~96dB).
5G baseband processor: 4096-point FFT @30.72MHz sample rate. Uses 16-bit fixed-point + BFP, processing tens of thousands of OFDM symbols per second.

Pitfalls and Limitations:

Float32 with large N: When N > 2²⁰, accumulated errors can become significant. Scenarios such as astronomical observations and large seismic arrays should use float64.
Forgetting to scale in fixed-point: The most common bug. After overflow, values "wrap around," producing completely wrong results that are hard to debug.
Twiddle factor precision: In fixed-point FFT, the precision of the sin/cos lookup table also affects results. Typically 2-4 more bits than the data width are needed.
Parseval's theorem check: Time-domain energy should equal frequency-domain energy. If the discrepancy exceeds expected precision, there is a precision problem.

When Not to Use (Fixed-Point FFT):

If your platform has a floating-point unit (FPU), use floating-point directly. Fixed-point is only worthwhile when there is no FPU or power consumption is extremely constrained.
If precision requirements exceed 90dB, 16-bit fixed-point is insufficient — 24-bit or 32-bit is needed.
Alternative: Floating-point IP cores are available on FPGAs (Xilinx FFT IP supports float32), at the cost of more resources.

✅ Quick Check

Q1: What is the approximate SQNR (in dB) of a 1024-point float32 FFT?

Show answer

float32 has ε_m ≈ 1.2×10⁻⁷, SQNR ≈ 20log₁₀(1/(ε_m√log₂1024)) ≈ 20log₁₀(1/(1.2×10⁻⁷×√10)) ≈ 131 dB.

Q2: In a fixed-point 16-bit FFT with 1-bit right-shift per stage, how many effective bits remain after 10 stages (1024-point)?

Show answer

16 - 10 = 6 effective bits, SQNR ≈ 6×6 = 36 dB. This is usually insufficient — block floating point or 32-bit fixed-point is needed.

← 6.1 Advanced FFT 6.3 OLA/OLS →

6.3 Overlap-Add / Overlap-Save (OLA/OLS)

Learning Objectives

Derive the correctness condition for OLA and OLS (FFT length >= L+M-1)
Compute the latency and computational cost of segmented convolution
Choose between OLA and OLS based on implementation requirements

Why does this matter? Because real-time audio processing, streaming filtering, and voice call noise reduction all require "process as you receive" — OLA/OLS is the standard approach for applying FFT to infinitely long streams.

Previously:6.2 addressed precision issues. But there is another implementation problem: real signals are streams (infinitely long), making it impossible to FFT everything at once. How do you process in segments yet get results identical to processing all at once?

Pain Point: Real-time audio effects (reverb, EQ) need to convolve the input with an impulse response, but audio is an infinitely long stream — you cannot wait until everything is received before computing the FFT.

Historical Context: Thomas Stockham proposed the concept of Fast Convolution in 1966 — using FFT to accelerate convolution. OLA (Overlap-Add) and OLS (Overlap-Save) are two strategies for performing fast convolution on long sequences in segments. Both have the same efficiency but differ in implementation.

Principles

Intuition: The length of a linear convolution = sum of the two sequence lengths - 1. After chopping a long sequence into segments, each segment's convolution result is longer than the original segment — the tail "overflows" into the next segment's range. OLA and OLS handle this overflow differently.

Overlap-Add (OLA)

Split the input $x[n]$ into non-overlapping segments, each of length L.
Zero-pad each segment to $N = L + M - 1$ (M = filter length).
FFT convolution: $Y_i = \text{IFFT}\{\text{FFT}\{x_i, N\} \cdot H[k]\}$.
Each segment's result has length N > L; the trailing M-1 points are overlap-added with the head of the next segment.

Overlap-Save (OLS)

Split the input $x[n]$ into overlapping segments, each of length N, with adjacent segments overlapping by M-1 points.
Perform length-N FFT convolution directly (no zero-padding needed).
The first M-1 points of each segment's result are erroneous due to circular convolution — discard them.
Keep the remaining L = N - M + 1 points, which are exactly identical to the linear convolution.

Derivation: Why Discarding the First M-1 Points in OLS Is Correct

Length-N FFT convolution computes circular convolution, but what we need is linear convolution.

The difference between circular and linear convolution only appears where the "tail wraps around to the head." Specifically:

Linear convolution $y_{\text{lin}}[n] = \sum_{m=0}^{M-1} h[m]\,x_i[n-m]$; when $n < M-1$, $x_i[n-m]$ may access samples before the current segment.

In circular convolution, $x_i[n-m]$ "wraps around" to the end of the segment (modulo N), producing incorrect results.

However, if we overlap each segment by M-1 points, the first M-1 samples of the current segment are the last M-1 samples of the previous segment. Circular convolution for $n \geq M-1$ does not need to access data outside the segment, and is therefore perfectly identical to linear convolution.

Therefore: discard the first M-1 points (the erroneous circular convolution part) and keep the remaining L = N - M + 1 points (the correct part).

How to Use

Design Steps:

Determine filter length M (length of the impulse response).
Choose segment length L: Latency = L/f_s seconds. Larger L → higher latency but better efficiency (FFT overhead is amortized).
FFT length N = L + M - 1, rounded up to the next power of 2.
Pre-compute $H[k] = \text{FFT}\{h[n], N\}$ (only needs to be done once).
Per-segment processing:
- OLA: Take L points → zero-pad to N → FFT → multiply by H[k] → IFFT → overlap-add.
- OLS: Take N points (including M-1 overlap) → FFT → multiply by H[k] → IFFT → discard first M-1 points.

Concrete Example: Audio Reverb Effect

// Impulse response: concert hall IR, M = 4096 points (@ 48kHz = 85.3ms reverb tail)
// Segment length: L = 4096
// FFT length: N = L + M - 1 = 8191 → round to 8192 (2^13)
// Latency = L / fs = 4096 / 48000 = 85.3 ms

// Computational cost comparison:
// Direct convolution: L × M = 4096 × 4096 = 16,777,216 multiplications/segment
// FFT convolution: 3 × (N/2)log₂N = 3 × 4096 × 13 = 159,744 multiplications/segment
//                  (2 FFTs + 1 IFFT, each (N/2)log₂N complex multiplications)
// Speedup: 16.8M / 160K ≈ 105×

// To reduce latency:
// L = 512 → latency = 10.7ms, N = 4608 → 8192
// Same computation per segment (same N), but more segments per second
// Segments per second = 48000 / 512 = 93.75
// FFTs per second = 93.75 × 3 = 281 8192-point FFTs

Application Scenarios

Audio effects: Reverb (IRs up to several seconds = hundreds of thousands of points), EQ (FIR with hundreds to thousands of taps). DAW software like Pro Tools uses OLA for real-time multi-track audio processing.
Real-time Acoustic Echo Cancellation (AEC): Room impulse response ~100ms = 1600 points @16kHz. One segment every 10ms, using OLS for fast convolution.
Radar Pulse Compression: Correlating long pulses with matched filters, using OLA/OLS for real-time processing.

Pitfalls and Limitations:

First segment of OLS: Requires prepending M-1 zeros to the input (since there is no "previous segment tail" available).
FFT length too short: If N < L + M - 1, circular convolution will produce errors — results will have aliasing. This is the most common bug.
Latency-efficiency tradeoff: Smaller L → lower latency but poorer efficiency (FFT overhead is large). Music performance requires <10ms latency, constraining L to 256-512.
Numerical precision: Very long IRs (several seconds) may require very large N, where float32 precision may be insufficient. Partitioned IR techniques can be used (small N for low latency in early parts, large N for high efficiency in later parts).

OLA vs OLS Comparison:

OLS has a simpler implementation (no accumulation buffer needed) and more regular memory access patterns.
OLA is conceptually more intuitive, and input segments do not need to overlap (saving a bit of memory).
Both have exactly the same efficiency (same number and size of FFTs).

When OLA/OLS is not needed:

Short filter (M < 64) → direct time-domain convolution may be faster (no FFT overhead).
Non-real-time processing with sufficient memory → perform a single full-length FFT convolution.
Alternative: Partitioned Convolution splits a long IR into multiple segments of different FFT sizes, balancing low latency and high efficiency.

Interactive: Overlap-Add Segmented Convolution

Convolution of a long signal (4096 points) with a short filter (64 points). Comparing direct convolution and OLA results — perfectly identical.

Adjust parameters above to see the plot

✅ Quick Check

Q1: An audio reverb impulse response is 4096 points, processed using OLA. What is the minimum FFT length?

Show answer

N >= L + M - 1. If segment length L=4096, then N >= 4096+4096-1=8191, round to 8192 (power of 2).

Q2: Which is simpler to implement, OLA or OLS? Why?

Show answer

OLS is simpler — no extra overlap-add step is needed; just discard the first M-1 points of each segment's IFFT result.

← 6.2 Numerical Precision 6.4 Filter Design →

6.4 Filter Design

Learning Objectives

Design an FIR low-pass filter using the window method (from specifications to h[n])
Use the Kaiser formula to compute the required filter order and beta parameter
Compare the pros and cons of FIR vs IIR and choose based on requirements

Why does this matter? Because the first step in almost every DSP system is filtering — noise removal, anti-aliasing, channel selection. Not knowing how to design filters means not knowing how to do DSP.

Previously:6.3 solved the long-sequence convolution problem. Now we design the convolution kernel — the filter itself.

Pain Point: "I need a low-pass filter with a cutoff frequency of 100Hz, transition bandwidth within 20Hz, and stopband attenuation of 60dB — what does h[n] look like? How many taps do I need?"

Historical Context:

Window Method: The most intuitive FIR design approach — directly truncate the ideal impulse response and apply a window.
Kaiser Window: James Kaiser, 1974. Proposed empirical formulas to precisely control the relationship between transition bandwidth and stopband attenuation.
Parks-McClellan Algorithm (Equiripple Design): James McClellan & Thomas Parks, 1972. An optimization design based on Chebyshev approximation, achieving the minimum maximum error for a given filter order.

Principles

Intuition: The frequency response of an ideal low-pass filter is a rectangular function (passband=1, stopband=0). Taking the inverse Fourier transform yields a sinc function — but sinc is infinitely long! We must truncate it. Truncation = multiplying by a rectangular window = convolving in the frequency domain with a sinc → Gibbs phenomenon (ripples at the edges). Using a better window can suppress the ripples.

Window method steps:

Ideal impulse response: $h_{\text{ideal}}[n] = \frac{\sin(\omega_c n)}{\pi n}$ (sinc function)
Truncate to 2M+1 points (M determines the transition bandwidth)
Multiply by window function $w[n]$: $h[n] = h_{\text{ideal}}[n] \cdot w[n]$

Kaiser window design formulas:

Given stopband attenuation $A_s$ (dB) and transition bandwidth $\Delta\omega$ (rad/s):

M \approx \frac{A_s - 8}{2.285 \cdot \Delta\omega}

\beta = \begin{cases} 0.1102(A_s - 8.7) & \text{if } A_s > 50 \\ 0.5842(A_s - 21)^{0.4} + 0.07886(A_s - 21) & \text{if } 21 \leq A_s \leq 50 \\ 0 & \text{if } A_s < 21 \end{cases}

Derivation: Origin of the Kaiser Window Design Formulas

The Kaiser window is an approximation of the DPSS (Discrete Prolate Spheroidal Sequence), using the zero-order modified Bessel function of the first kind:

w[n] = \frac{I_0\!\left(\beta\sqrt{1-(2n/M)^2}\right)}{I_0(\beta)}, \quad |n| \leq M/2

Through extensive numerical experiments, Kaiser discovered the above empirical formulas relating $\beta$ and $M$ to stopband attenuation $A_s$ and transition bandwidth $\Delta\omega$, with accuracy within 10%.

$\beta$ controls the window shape (larger → wider main lobe, lower sidelobes). $M$ controls the window length (longer → narrower main lobe → narrower transition band).

How to Use

Kaiser Window FIR Design Steps:

Concrete Example: Design a low-pass FIR with $f_c = 100$Hz, $f_s = 1000$Hz, transition band 50Hz (passband to 100Hz, stopband starting at 150Hz), stopband attenuation 60dB.

// Step 1: Convert specifications to normalized frequency
ωp = 2π × 100 / 1000 = 0.2π  (passband edge)
ωs = 2π × 150 / 1000 = 0.3π  (stopband edge)
Δω = ωs - ωp = 0.1π ≈ 0.3142 rad
ωc = (ωp + ωs) / 2 = 0.25π   (cutoff frequency, midpoint of passband and stopband)
As = 60 dB

// Step 2: Compute β
As > 50, so β = 0.1102 × (60 - 8.7) = 0.1102 × 51.3 = 5.653

// Step 3: Compute filter order M
M = (As - 8) / (2.285 × Δω) = 52 / (2.285 × 0.3142) = 52 / 0.718 ≈ 72.4
Take M = 73 (odd, ensuring Type I FIR)

// Step 4: Generate h[n]
for n in range(-36, 37):  // -M/2 to M/2
    h_ideal = sin(0.25π × n) / (π × n)  // at n=0 take the limit = ωc/π = 0.25
    w = kaiser(n, β, M)
    h[n] = h_ideal × w

// Step 5: Verify
// Passband ripple < 0.01dB (Kaiser window at As=60dB yields nearly flat passband)
// Stopband attenuation ≈ 60dB ✓
// Transition bandwidth ≈ 50Hz ✓
// Delay = M/2 = 36 samples = 36ms (@1kHz)

FIR vs IIR Comparison

Property	FIR (Finite Impulse Response)	IIR (Infinite Impulse Response)
Stability	Always stable (no feedback)	May be unstable (poles must be inside unit circle)
Phase	Can achieve perfect linear phase	Nonlinear phase (unless allpass compensation is used)
Order	Usually requires higher order	Much lower order for equivalent specs (~1/10)
Design	Window method / Parks-McClellan	Butterworth / Chebyshev / Elliptic
Use cases	When linear phase is needed (audio, communications)	Real-time control, low-latency requirements

Python Example: Designing an FIR Low-Pass Filter with scipy.signal.firwin

# Python: design an FIR low-pass filter with scipy.signal.firwin import numpy as np import scipy.signal as sig import matplotlib.pyplot as plt # Specifications fs = 8000 # sampling rate fc = 1000 # cutoff frequency Astop = 60 # stopband attenuation (dB) trans_width = 200 # transition width # Kaiser window automatically selects beta and N N, beta = sig.kaiserord(Astop, trans_width / (fs/2)) print(f"Filter order N = {N}, Kaiser beta = {beta:.2f}") # Design filter h = sig.firwin(N, fc, window=('kaiser', beta), fs=fs) # Frequency response w, H = sig.freqz(h, fs=fs) H_db = 20 * np.log10(np.abs(H) + 1e-12) # Plot fig, axes = plt.subplots(2, 1, figsize=(10, 7)) axes[0].stem(h, basefmt=' '); axes[0].set_title(f'Impulse response h[n] (N={N})') axes[0].set_xlabel('n') axes[1].plot(w, H_db); axes[1].axhline(-Astop, color='r', linestyle='--', label=f'-{Astop} dB') axes[1].axvline(fc, color='g', linestyle='--', label=f'fc = {fc} Hz') axes[1].set_xlabel('Hz'); axes[1].set_ylabel('dB') axes[1].set_title('Frequency response'); axes[1].legend(); axes[1].grid() axes[1].set_ylim(-100, 5) plt.tight_layout(); plt.show() # Apply to a signal x = np.random.randn(10000) # white noise y = sig.lfilter(h, 1, x) print(f"Input power: {np.var(x):.3f}, Output power: {np.var(y):.3f}")

Cutoff Frequency (Hz): 100

Filter Order: 37

Window Function:

Adjust parameters above to see the plot

Application Scenarios

Audio EQ (Equalizer): A 32-band EQ uses 32 FIR bandpass filters, each ~128 taps, processed in real time with OLA.
Communications baseband filtering: 5G NR channel filters use FIR, requiring passband flatness <0.1dB and stopband attenuation >50dB.
Biomedical signal preprocessing: ECG 50/60Hz notch filter requires only a 2nd-order IIR (FIR would need hundreds of taps).

Pitfalls and Limitations:

Kaiser formula is empirical: Results may deviate by 5-10%; always verify the frequency response after design.
Linear phase = group delay = M/2 samples: High-order FIR delay may be unacceptable (e.g., 1000 taps @1kHz = 500ms delay).
Parks-McClellan may not converge: With extremely narrow transition bands or very high stopband attenuation, parameter adjustments are needed.

When Not to Use FIR / Window Method:

Very narrow transition band + low latency required → use IIR (Butterworth/Chebyshev/Elliptic).
Optimal design needed (minimum order to meet specs) → use Parks-McClellan instead of the window method.
Alternative: Multirate Filtering — decimating first then filtering can dramatically reduce computation. CIC + FIR combination is the standard architecture for SDR.

📝 Worked Example

Design a low-pass FIR: passband edge 200Hz, stopband edge 300Hz, stopband attenuation 50dB, f_s=2000Hz. (a) Normalized frequencies? (b) Kaiser beta? (c) Filter order? (d) Center value of h[n]?

Show solution

(a) ω_p=200/1000·π=0.2π, ω_s=300/1000·π=0.3π, Δω=0.1π

(b) A_s=50 → β = 0.5842(50−21)^0.4 + 0.07886(50−21) = 3.39

(c) M ≈ (50−8)/(2.285×0.1π) = 58.5 → round to 59 (odd) → 59 taps = 60 points

(d) h[M/2] = ω_c/π = 0.25π/π = 0.25

✅ Quick Check

Q1: Designing a low-pass FIR with 60dB stopband attenuation and 50Hz transition band — how many taps are needed with a Kaiser window?

Show answer

M ≈ (A_s - 8)/(2.285·Δω) = (60-8)/(2.285·2π·50/fs). If fs=1kHz → Δω=0.1π → M≈(52)/(2.285·0.314)≈72 taps.

Q2: What is the main advantage of FIR filters over IIR?

Show answer

FIR is inherently stable (no risk of poles outside the unit circle) and can easily achieve exact linear phase (= pure delay, no distortion).

✅ Quick Check

Q1: Design a low-pass FIR with 60dB stopband attenuation and transition band 0.1π. How many taps with Kaiser window?

Show answer

M ≈ (A_s - 8)/(2.285·Δω) = (60-8)/(2.285×0.314) ≈ 72 taps.

Q2: What is the biggest advantage of FIR? And of IIR?

Show answer

FIR: inherently stable + exact linear phase. IIR: achieves the same performance with much lower order (= less computation and lower latency).

The Four Types of Linear-Phase FIR Filters

A linear-phase FIR must satisfy a symmetry or antisymmetry condition $h[n] = \pm h[N-1-n]$. Depending on the symmetry type and the parity of the length, they fall into exactly four classes, each with distinct restrictions.

Type	Symmetry	Length N	Properties	Cannot Realize
Type I	Symmetric $h[n]=h[N-1-n]$	Odd (N odd)	No restriction; most general	—
Type II	Symmetric	Even (N even)	$H(e^{j\pi})=0$	High-pass, band-stop
Type III	Antisymmetric $h[n]=-h[N-1-n]$	Odd	$H(e^{j0})=0$ and $H(e^{j\pi})=0$	Low-pass, high-pass
Type IV	Antisymmetric	Even	$H(e^{j0})=0$	Low-pass, band-stop

Derivation: Why do these restrictions exist?

Why does $H(\pi)=0$ for Type II?

For Type II, $N$ is even and $h[n]=h[N-1-n]$. Substituting into the DTFT:

$$H(e^{j\pi}) = \sum_{n=0}^{N-1}h[n](-1)^n$$

Pair the symmetric terms: $h[k]\cdot(-1)^k + h[N-1-k]\cdot(-1)^{N-1-k} = h[k]\cdot[(-1)^k + (-1)^{N-1-k}]$

When $N$ is even, $N-1$ is odd, so $(-1)^{N-1-k} = -(-1)^k$, and each pair sums to zero. $\blacksquare$

Why does $H(0)=0$ for Type III/IV?

Antisymmetry $h[n]=-h[N-1-n]$ implies that the coefficient sum $\sum h[n] = 0$ (pairs cancel).

Since $H(e^{j0}) = \sum h[n]$, it must equal zero. $\blacksquare$

Practical choices:

Low-pass / band-pass → use Type I
When odd symmetry is required (e.g., Hilbert transformers, differentiators) → use Type III or IV
Avoid: designing a high-pass with Type II forces a zero at Nyquist, making the specification impossible to meet
Most design tools (scipy.signal.firwin) automatically select the correct type

Group Delay Analysis

The real meaning of linear phase is not "phase = 0" but "all frequency components are delayed by the same amount of time when passing through the filter" — this is quantified by the group delay.

Definition

$$\tau_g(\omega) = -\frac{d\angle H(e^{j\omega})}{d\omega}$$

Group delay is the negative derivative of the phase response with respect to $\omega$, measured in samples. It tells you how much the envelope of a signal near frequency $\omega$ is delayed.

Why does it matter?

Linear phase: $\tau_g$ is constant → all frequencies delayed equally → signal shape preserved
Nonlinear phase: $\tau_g$ varies with frequency → different frequencies delayed differently → waveform distortion (even with ideal magnitude response)
Audio and measurement applications are particularly sensitive to phase distortion — for example, ECG needs to preserve QRS waveforms

FIR vs IIR Comparison

Type	Group Delay	Waveform Fidelity
Linear-phase FIR	Constant $(N-1)/2$	Perfect (pure delay)
Butterworth IIR	Non-constant, peaks at transition band	Distorted
Chebyshev I IIR	Severely varying in transition band	Severely distorted
Bessel IIR	Nearly constant within passband	Good (optimized for phase)

Practical applications:

Audio processing: choose FIR or Bessel; avoid the phase distortion of Chebyshev/Elliptic
ECG/EEG: must use linear-phase FIR (to preserve QRS/spike waveforms)
Communications receivers: can use nonlinear-phase IIR (compensated later by an equalizer)
Real-time forward-only: must be causal → IIR cannot achieve perfect linear phase
Offline processing: can use filtfilt (forward-backward filtering) to achieve zero phase

← 6.3 OLA/OLS 6.5 2D FFT →

6.5 2D FFT & Image Processing

Learning Objectives

Understand the separability of 2D DFT (row FFT + column FFT)
Implement image low-pass/high-pass filtering using frequency-domain masks
Explain the relationship between MRI k-space and 2D FFT

Why does this matter? Because medical imaging (MRI), satellite remote sensing, and computer vision all perform denoising and feature extraction in the frequency domain — 2D FFT is a fundamental skill in image processing.

Previously:6.4 designed one-dimensional filters. Images are two-dimensional signals — 2D FFT lets you analyze and process images in the spatial frequency domain.

Pain Point: "The image has periodic moire patterns — how do I remove them?" "MRI scanning is too slow — can we acquire only part of k-space to speed it up?"

Historical Context: The Cooley-Tukey FFT algorithm of 1965 simultaneously catalyzed the application of 2D FFT in image processing. The MRI k-space concept was proposed by Paul Lauterbur in 1973 (for which he received the 2003 Nobel Prize in Physiology or Medicine). Satellite and astronomical image processing made extensive use of 2D FFT for removing periodic interference in the 1970s-80s.

Principles

Intuition: In 1D, low frequency = slow variation, high frequency = fast variation. The 2D case is entirely analogous: low spatial frequencies = slowly varying regions (smooth areas, overall brightness); high spatial frequencies = rapidly varying regions (edges, textures, noise).

2D DFT formula:

$$F[u,v] = \sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f[m,n]\,e^{-j2\pi(um/M + vn/N)}$$

Separability: 2D DFT can be decomposed into row FFTs first, then column FFTs:

$$F[u,v] = \sum_{m=0}^{M-1}\left(\sum_{n=0}^{N-1} f[m,n]\,e^{-j2\pi vn/N}\right)e^{-j2\pi um/M}$$

Computational cost: 2D FFT of an M×N image = M row FFTs of length N + N column FFTs of length M = O(MN log(MN)).

Meaning of the frequency domain center:

Center (u=0, v=0) = DC component = average image brightness
Near center = low frequency = overall structure and smooth regions
Far from center = high frequency = edges, details, noise
Bright line in a specific direction = periodic structure in that direction in the image

Derivation: Separability of 2D DFT

The kernel of 2D DFT, $e^{-j2\pi(um/M + vn/N)}$, can be factored into the product of two 1D kernels:

$$e^{-j2\pi(um/M + vn/N)} = e^{-j2\pi um/M} \cdot e^{-j2\pi vn/N}$$

Therefore:

$$F[u,v] = \sum_m e^{-j2\pi um/M} \underbrace{\left(\sum_n f[m,n]\,e^{-j2\pi vn/N}\right)}_{G[m,v] = \text{1D DFT of row } m}$$

First perform 1D DFT on each row to obtain $G[m,v]$, then perform 1D DFT on each column of $G[m,v]$ to obtain $F[u,v]$. The order can be swapped (columns first, then rows).

How to Use

2D Frequency-Domain Image Processing Steps:

Load grayscale image $f[m,n]$ (M×N pixels).
(Optional) Multiply by $(-1)^{m+n}$ to center the spectrum (fftshift).
Compute 2D FFT → $F[u,v]$.
Design frequency-domain mask $H[u,v]$ (low-pass/high-pass/bandpass/notch).
Multiply $G[u,v] = F[u,v] \cdot H[u,v]$.
2D IFFT → processed image $g[m,n]$.

Concrete Example: Removing Power Line Interference from Satellite Imagery

// Satellite image 512×512 pixels, with horizontal stripe interference caused by 60Hz power lines
// Step 1: Compute 2D FFT and observe the spectrum
F = fft2(image)
F_shifted = fftshift(F)
magnitude = log(1 + abs(F_shifted))  // log scale display

// Step 2: Find bright spots in the spectrum corresponding to interference
// Horizontal stripes → bright spots on the vertical axis (at (0, ±v₀))
// v₀ corresponds to the spatial frequency of the interference

// Step 3: Design a Notch Filter
// Place a small circular zero-gain region at each bright spot, radius r=5 pixels
H = ones(512, 512)
for each notch_point (u0, v0):
    for u, v in circle(u0, v0, r=5):
        H[u, v] = 0  // notch

// Step 4: Frequency-domain filtering + IFFT
G = F_shifted * H
result = real(ifft2(ifftshift(G)))
// Result: stripe interference removed, image details preserved

MRI Connection — k-space Is the 2D Frequency Domain

The MRI scanner's RF coils directly acquire data in k-space (= 2D Fourier space). Each scan trajectory fills one line of k-space. Image reconstruction = 2D IFFT.

Scanning all of k-space → complete image, but long scan time (several minutes).
Scanning only the center of k-space (low frequencies) → blurry image but fast (used for localizer scans).
Scanning only the outer regions of k-space (high frequencies) → only edge information remains.
Compressed Sensing MRI: Randomly sample part of k-space (~25%), use sparse reconstruction algorithms to recover the full image. Scan time reduced by 4x.

Application Scenarios

MRI image reconstruction: 256×256 k-space data → 2D IFFT → anatomical image. Compressed sensing can achieve 4-8x acceleration.
Astronomical image processing: Telescope images are blurred by atmospheric turbulence (PSF blur); Wiener deconvolution in the frequency domain recovers detail.
Industrial X-ray inspection: Periodic structures in PCB X-ray images (via arrays) produce bright spots in the frequency domain; notch filtering can highlight defects.

Pitfalls and Limitations:

Boundary effects: 2D FFT assumes the image is periodically extended. If the image boundaries are discontinuous, a cross-shaped spectral leakage occurs. Solutions: mirror padding or edge tapering.
Ringing: Sharp frequency-domain masks (e.g., ideal low-pass) cause ringing in the spatial domain. Use Gaussian or Butterworth-type masks to mitigate.
Phase matters: The structural information of an image is primarily in the phase (not the magnitude). Do not destroy phase during processing.

When Not to Use 2D FFT:

Structures in the image are non-periodic → frequency-domain methods perform poorly; use spatial-domain methods (e.g., bilateral filter, non-local means).
Local processing is needed (different filtering for different regions) → 2D STFT or wavelet is needed.
Alternative: Wavelet transforms are more powerful than 2D FFT for multi-scale analysis, and are mainstream for modern image compression (JPEG 2000) and denoising.

Interactive: 2D FFT Image Filtering

Select a test pattern, observe its 2D spectrum, and apply different frequency-domain filters.

Pattern: Filter: Cutoff radius D₀: 20

Original Image

2D FFT Spectrum

Filtered Image

← 6.4 Filter Design 6.6 OFDM →

6.6 OFDM (Orthogonal Frequency Division Multiplexing)

Learning Objectives

Describe the complete OFDM symbol transmit/receive flow (IFFT → +CP → channel → -CP → FFT)
Derive how CP converts linear convolution into circular convolution
Perform frequency-domain channel estimation using LS/MMSE

Why does this matter? Because your phone, Wi-Fi, and digital TV all use OFDM — it is the absolute core of modern wireless communications.

Previously:6.5 demonstrated FFT applications in imaging. Now we look at communications — OFDM uses IFFT/FFT to drive every connection on your phone.

Pain Point: Wireless channels exhibit multipath effects — after reflections, multiple copies of the same symbol arrive at the receiver at different times, causing Inter-Symbol Interference (ISI). Traditional single-carrier systems require complex equalizers to counteract ISI.

Historical Context:

Robert Chang, 1966: Proposed the basic concept of OFDM at Bell Labs.
Weinstein & Ebert, 1971: First implemented OFDM modulation/demodulation using DFT/IDFT, making OFDM practically feasible.
Peled & Ruiz, 1980: Introduced Cyclic Prefix (CP) to solve the ISI problem.
Widespread adoption from the 1990s: DVB-T (digital TV), 802.11a (Wi-Fi), ADSL.
4G LTE (2009), 5G NR (2018), Wi-Fi 6/7 are all OFDM-based.

Principles

Intuition: Instead of using a single high-speed carrier to transmit large amounts of data (each symbol is very short → easily corrupted by ISI), spread the data across N low-speed subcarriers for parallel transmission. Each subcarrier's symbol period = N times the original → much longer than the channel delay spread → ISI becomes negligible.

Transmitter:

N QAM (Quadrature Amplitude Modulation) symbols $D[0], D[1], \ldots, D[N-1]$
N-point IFFT: $d[n] = \text{IFFT}\{D[k]\}$ → time-domain OFDM symbol
Add CP (Cyclic Prefix): copy the last $N_{CP}$ points of $d[n]$ to the front
Pass through DAC and RF front-end for transmission

Receiver:

Remove CP
N-point FFT: $Y[k] = \text{FFT}\{y[n]\}$
Equalize each subcarrier independently: $\hat{D}[k] = Y[k] / \hat{H}[k]$ (only one complex division!)

Why does CP work?

CP converts linear convolution into circular convolution — which is exactly what FFT multiplication assumes. As long as the CP length ≥ channel delay spread, after FFT at the receiver each subcarrier sees $Y[k] = H[k] \cdot D[k] + W[k]$, a perfect frequency-domain multiplication relationship.

Full Derivation: How CP Eliminates ISI

Channel impulse response $h[n]$, length L (delay spread = L-1 samples).

Transmitted time-domain signal $d[n]$ of length N, with CP the total length is $N + N_{CP}$.

Received signal: $r[n] = \sum_{l=0}^{L-1} h[l]\,d_{\text{CP}}[n-l] + w[n]$

After removing the CP (taking n = N_CP to N_CP+N-1), if $N_{CP} \geq L-1$:

The "tail" of the previous OFDM symbol falls entirely within the CP interval → discarded → no ISI.
The convolution of the current symbol "ramps up" within the CP interval; after CP removal, it is equivalent to circular convolution of $d[n]$.

Circular convolution in the frequency domain = element-wise multiplication:

Y[k] = H[k] \cdot D[k] + W[k]

Therefore each subcarrier can be equalized independently: $\hat{D}[k] = Y[k] / H[k]$. In contrast, single-carrier systems require matrix inversion (O(N³) or equalizer approximation).

How to Use

5G NR Specific Parameters:

Numerology (μ)	Subcarrier Spacing (SCS)	FFT Size (max)	CP Length	Symbol Duration	Use Case
0	15 kHz	4096	288 / 352 samples	71.4 μs	<3GHz wide-area coverage
1	30 kHz	4096	288 / 352 samples	35.7 μs	3.5GHz mainstream deployment
2	60 kHz	4096	288 / 352 samples	17.8 μs	mmWave
3	120 kHz	4096	288 / 352 samples	8.9 μs	mmWave high-speed

Concrete Example: 5G NR μ=1 (30kHz SCS)

// FFT size: 4096
// Sample rate: 4096 × 30kHz = 122.88 MHz
// Active subcarriers: 3276 (100MHz bandwidth)
// OFDM symbol duration: 1/30kHz + CP = 33.33μs + 2.34μs = 35.67μs
// Per slot (14 symbols): 0.5 ms
// OFDM symbols per second: 14 × 2000 = 28,000
// FFTs per second (TX+RX): 28,000 × 2 = 56,000 4096-point FFTs
// Complex multiplications per second: 56,000 × 4096 × log₂(4096)/2 = 56,000 × 24,576 ≈ 1.38 × 10⁹

Channel Estimation

Pilots (Reference Signals): Insert known symbols at known subcarrier positions.
LS Estimation (Least Squares): $\hat{H}[k_p] = Y[k_p] / D_{\text{pilot}}[k_p]$ (only at pilot positions)
Interpolation: Interpolate from LS-estimated pilot points to obtain $\hat{H}[k]$ for all subcarriers.
MMSE Estimation: Exploits the statistical properties of the channel delay profile for optimal interpolation, outperforming LS by 3-5dB.
Frequency-domain equalization: Zero-Forcing: $\hat{D}[k] = Y[k]/\hat{H}[k]$. MMSE: $\hat{D}[k] = \frac{\hat{H}^*[k]}{|\hat{H}[k]|^2 + \sigma_w^2/\sigma_D^2} Y[k]$

Application Scenarios

5G NR: Both downlink and uplink use CP-OFDM (uplink can also use DFT-s-OFDM to reduce PAPR). Over 2 million base stations deployed globally (2025).
Wi-Fi 6/7 (802.11ax/be): OFDMA (multi-user OFDM), 2048-point FFT, supporting 160/320MHz bandwidth.
DVB-T2 digital television: 32K FFT (32768 points), handling long delay spreads (mountain reflections).

Pitfalls and Limitations:

Carrier Frequency Offset (CFO): Transmitter and receiver oscillator frequencies are not perfectly matched → subcarriers are no longer orthogonal → Inter-Carrier Interference (ICI). For example, 5G @28GHz with 0.1ppm oscillator accuracy = 2.8kHz offset, which is 9.3% of 30kHz SCS — frequency synchronization is mandatory.
PAPR (Peak-to-Average Power Ratio): When the phases of N subcarriers align, peak power can reach N times the average power. PAPR ~10-12dB → power amplifiers need a large linear range (expensive and power-hungry). PAPR reduction methods: clipping, DFT-s-OFDM, tone reservation.
CP is overhead: CP carries no new information and occupies ~7% of time and spectral resources.

When Not to Use OFDM:

The channel has almost no multipath (e.g., satellite communications) → single carrier is sufficient, and there is no PAPR problem.
Ultra-low latency requirements (CP adds latency) → FBMC (Filter Bank Multi-Carrier) does not need CP.
Alternative: SC-FDMA (Single-Carrier FDMA) is used for LTE uplink with 2-3dB lower PAPR than OFDM. FBMC and UFMC are candidate technologies for beyond-5G.

✅ Quick Check

Q1: What happens if the OFDM cyclic prefix (CP) is too short?

Show answer

Channel delay exceeds CP → adjacent OFDM symbols overlap → ISI. Also, the effective convolution within the FFT interval is not circular → subcarrier orthogonality is broken → ICI.

Q2: 5G NR subcarrier spacing 30kHz, 4096 FFT — what is the approximate bandwidth?

Show answer

30kHz × 4096 ≈ 122.88 MHz.

Interactive: OFDM Symbol Generation (IFFT)

64 subcarriers, each carrying a QPSK symbol. IFFT converts frequency-domain data into a time-domain OFDM symbol.

Adjust parameters above to see the plot

Interactive: Complete OFDM Transceiver Simulation

Full simulation of the OFDM transceiver chain: IFFT generates symbol → add CP → multipath channel → add noise → remove CP → FFT → LS channel estimation → equalization → demodulation.

FFT size:

Modulation:

Channel delay (samples): 5

Channel attenuation: 0.60

SNR (dB): 25

CP length: 16

BER: ---

← 6.5 2D FFT 6.7 Radar →

6.7 Radar Signal Processing

Learning Objectives

Implement pulse compression using matched filtering
Explain the dual-FFT architecture of Range-Doppler processing
Determine a waveform's range-velocity resolution capability from its ambiguity function

Why does this matter? Because autonomous driving, weather forecasting, and military detection all rely on radar, and the core of radar signal processing is FFT.

Previously:6.6 covered FFT in communications. Radar is also a heavy user of FFT — range uses fast-time FFT, velocity uses slow-time FFT.

Pain Point: Radar must simultaneously measure a target's range and velocity. Range relies on pulse delay → fast-time FFT; velocity relies on Doppler shift → slow-time FFT. Both dimensions require FFT.

Historical Context:

Pulse Compression: P.M. Woodward, 1953. Used matched filtering to improve range resolution without increasing peak power.
Pulse-Doppler Processing: 1960s, led by the US Air Force, using FFT to extract Doppler frequencies from multiple pulse echoes.
Synthetic Aperture Radar (SAR): Developed from the 1950s, synthesizing a large antenna aperture from the flight path to achieve high-resolution imagery.

Principles

Intuition: Transmit a known waveform $s(t)$; the target-reflected signal is a delayed + frequency-shifted version. To find the delay (= range) and frequency shift (= velocity), the best approach is matched filtering — correlating the known waveform with the echo. FFT enables this correlation to be computed efficiently.

Matched Filter

The optimal filter that maximizes SNR = the time-reversed conjugate of the transmitted signal:

$$h_{\text{MF}}(t) = s^*(-t)$$

Frequency-domain implementation: $Y(\omega) = S^*(\omega) \cdot X(\omega)$ → IFFT → compressed pulse

Range-Doppler Processing

Received data is arranged in a 2D matrix: [fast-time × slow-time]

Fast-time FFT (each row): Range compression — compresses chirp pulses into narrow peaks.
Slow-time FFT (each column): Doppler processing — extracts velocity from phase changes across multiple pulses.
Result: Range-Doppler Map — each bright spot = a target, horizontal position = range, vertical position = velocity.

Resolution:

\Delta R = \frac{c}{2B} \quad (\text{range resolution, B = bandwidth})

$$\Delta v = \frac{\lambda}{2T_{\text{CPI}}} \quad (\text{velocity resolution, }T_{\text{CPI}} = \text{coherent processing interval})$$

Ambiguity Function

Definition:

$$\chi(\tau, f_d) = \int_{-\infty}^{\infty} s(t)\,s^*(t-\tau)\,e^{j2\pi f_d t}\,dt$$

Intuitive interpretation: The ambiguity function describes a waveform's resolution capability on the "range ($\tau$) — Doppler ($f_d$)" plane. $|\chi(0,0)|$ = peak (perfect match); $|\chi(\tau, f_d)|$ at other locations = sidelobes (degree of ambiguity).

Woodward's Theorem: The total volume of the ambiguity function is conserved — suppressing sidelobes in one area raises them elsewhere. Waveform design is about "sculpting" sidelobe shapes on the range-Doppler plane.

LFM Chirp: Oblique ridge-shaped ambiguity function (range and Doppler are coupled, but the main lobe is narrow).
Phase-Coded Pulse: Thumbtack shape (low sidelobes but Doppler-sensitive).

Derivation: Range Resolution of LFM Chirp

LFM (Linear Frequency Modulated) chirp: $s(t) = \text{rect}(t/T)\,e^{j\pi \mu t^2}$, where $\mu = B/T$ (chirp rate).

Matched filter output (at zero Doppler):

\chi(\tau, 0) = \int s(t)\,s^*(t-\tau)\,dt

After derivation (expanding and simplifying the quadratic phase terms), the result is approximately:

|\chi(\tau, 0)| \approx T\,\text{sinc}(B\tau)

The first null of the sinc function is at $\tau = 1/B$, corresponding to range:

\Delta R = \frac{c\tau}{2} = \frac{c}{2B}

Key observation: range resolution depends only on bandwidth B, independent of pulse duration T. This is the power of pulse compression — long pulse (high energy) + wide bandwidth (high resolution) can be achieved simultaneously.

How to Use

Concrete Example: Automotive 77GHz FMCW Radar

// System parameters
f_c = 77 GHz           // carrier frequency
B = 1 GHz              // Chirp bandwidth
T_chirp = 50 μs        // single Chirp duration
N_chirps = 128         // number of Chirps per CPI
λ = c / f_c = 3.896 mm // wavelength

// Range resolution
ΔR = c / (2B) = 3×10⁸ / (2×10⁹) = 0.15 m = 15 cm

// Maximum unambiguous range
R_max = c × T_chirp / 2 = 3×10⁸ × 50×10⁻⁶ / 2 = 7500 m
// (in practice limited by ADC sample rate and SNR, typically ~200m)

// Velocity resolution
Δv = λ / (2 × N_chirps × T_chirp)
   = 3.896×10⁻³ / (2 × 128 × 50×10⁻⁶) = 0.304 m/s ≈ 1.1 km/h

// Maximum unambiguous velocity
v_max = λ / (4 × T_chirp) = 3.896×10⁻³ / (4 × 50×10⁻⁶) = 19.5 m/s ≈ 70 km/h

// FFT processing
// Fast-time: 256-point FFT → 256 range bins, each bin = R_max/256 ≈ 29 m
//            (actual ADC samples, e.g., 256 @10MHz → covers 3840m)
// Slow-time: 128-point FFT → 128 velocity bins
// Range-Doppler Map: 256 × 128 matrix, each bright spot = one target

Adjust parameters above to see the plot

Application Scenarios

Automotive FMCW radar: 77GHz, B=1-4GHz, range resolution 3.75-15cm. 3-5 radars per vehicle, global annual production exceeding 500 million units (2025).
Weather radar: S-band (2.7-3.0GHz), using Doppler FFT to measure radial velocity of precipitation particles (wind field). WSR-88D Doppler radar uses 1024-point FFT, velocity resolution ~0.5 m/s.
SAR satellite imagery: Azimuth focusing uses FFT, achieving ~1m spatial resolution. Sentinel-1 satellite processes TB-scale data per orbit.

Pitfalls and Limitations:

LFM range-Doppler coupling: A moving target's Doppler shift is misinterpreted as a range offset. Doppler compensation or dual-slope chirp is needed.
Sidelobe masking: Sidelobes of strong targets can obscure weak targets. Windowing to reduce sidelobes + CFAR (Constant False Alarm Rate) adaptive threshold detection is needed.
Blind speed: When a target's velocity is exactly an integer multiple of $v_{\text{max}}$, the Doppler shift wraps around to zero → undetectable. Solved by staggered PRF (Pulse Repetition Frequency).

When Not to Use FFT-Based Radar Processing:

Very few and known targets → parametric estimation methods (e.g., MUSIC, ESPRIT) offer higher resolution than FFT.
Nonlinear frequency modulation (NLFM) waveforms → matched filtering still applies, but standard FFT-based processing needs modification.
Alternative: Compressed Sensing radar can reconstruct the Range-Doppler map from fewer samples (suitable for sparse scenes). MIMO radar uses multiple transmit/receive antennas to increase virtual aperture.

📝 Worked Example

77GHz FMCW automotive radar: bandwidth B=1GHz, chirp duration 50μs, 128 chirps. (a) Range resolution? (b) Maximum unambiguous range (256-point FFT)? (c) Velocity resolution?

Show solution

(a) ΔR = c/(2B) = 3×10⁸/(2×10⁹) = 0.15m = 15cm

(b) R_max = N·ΔR = 256×0.15 = 38.4m

(c) λ = c/f = 3.9mm, Δv = λ/(2·128·50μs) = 3.9×10⁻³/(2×128×50×10⁻⁶) = 0.30 m/s

✅ Quick Check

Q1: For a radar range resolution of 15cm, what bandwidth is needed?

Show answer

ΔR = c/(2B) → B = c/(2·0.15) = 3×10⁸/(0.3) = 1 GHz.

Q2: Why is the LFM chirp's ambiguity function ridge-shaped?

Show answer

Because LFM frequency increases linearly with time, creating a coupling between range and Doppler — the range estimate of a stationary target shifts due to Doppler offset.

Interactive: Ambiguity Function

The ambiguity function describes a radar waveform's resolution capability in two dimensions: range (delay τ) and velocity (Doppler f_d). Its shape determines the waveform's performance.

Waveform:

BT product: 50

Waveform description will be displayed here

Interactive: Matched Filter & Pulse Compression

Radar transmits a long chirp pulse, which the matched filter at the receiver compresses into a sharp peak. Long pulse = high energy (long-range detection); after compression = high range resolution (resolving close targets).

Chirp bandwidth B: 100 MHz

Pulse duration T: 10 μs

Number of targets:

Target spacing: 10.0 m

    Compression ratio and resolution info will be displayed here
  

Interactive: Radar Target Placement & Range-Doppler

Set the range, velocity, and amplitude of 3 targets and observe the response on the Range-Doppler Map.

Target 1: Range=80m Velocity=10m/s Amplitude=1.00

Target 2: Range=150m Velocity=-20m/s Amplitude=0.70

Target 3: Range=50m Velocity=30m/s Amplitude=0.50

Bandwidth B (MHz): 500

Number of chirps: 64

    Resolution info will be displayed here
  

← 6.6 OFDM 6.8 Array Processing →

6.8 Array Signal Processing

Learning Objectives

Write the ULA steering vector and explain the spatial Nyquist condition
Compare three beamforming methods: Delay-and-Sum, Capon, and MUSIC
Compute the angular resolution of an array from the beamwidth formula

Why does this matter? Because 5G massive MIMO, phased array radar, and sonar positioning all rely on antenna arrays — spatial filtering is FFT in the spatial dimension.

Previously:6.7 used FFT for range-velocity estimation in radar. Antenna arrays extend the same concept to space — using "spatial FFT" to estimate signal direction of arrival.

Pain Point: A base station simultaneously receives signals from multiple users — how do you distinguish signals coming from different directions? How does radar know the target's angle? A single antenna cannot do this — an antenna array is needed.

Historical Context:

Phased Array: Already in use during WWII radar in the 1940s.
Capon (MVDR) Beamforming: Jack Capon, 1969. Minimum Variance Distortionless Response.
MUSIC Algorithm: Ralph Schmidt, 1986. Exploits the orthogonality between signal subspace and noise subspace for high-resolution DOA (Direction of Arrival) estimation.
Massive MIMO: Thomas Marzetta, 2010. Proposed using large numbers of antennas (64-256) to simultaneously serve multiple users. Became a core 5G technology.

Principles

Intuition: A set of equally spaced antennas (Uniform Linear Array, ULA) receives the same plane wave, but each antenna has a slightly different reception time (depending on the wave's angle of incidence). This time difference = phase difference. Analyzing these phase differences reveals the signal's direction of arrival. This is perfectly analogous to time-domain sampling: antenna spacing = spatial sampling interval, angle of incidence = spatial frequency.

ULA Model

M antennas equally spaced by d. A plane wave arrives from angle $\theta$, and the phase difference between adjacent antennas is:

\Delta\phi = \frac{2\pi d \sin\theta}{\lambda}

Steering Vector:

$$\mathbf{a}(\theta) = \begin{bmatrix} 1 \\ e^{j\frac{2\pi d\sin\theta}{\lambda}} \\ e^{j\frac{2\pi \cdot 2d\sin\theta}{\lambda}} \\ \vdots \\ e^{j\frac{2\pi(M-1)d\sin\theta}{\lambda}} \end{bmatrix}$$

Beamforming = Spatial Filtering

y = \mathbf{w}^H \mathbf{x}

where $\mathbf{x}$ is the received vector from M antennas and $\mathbf{w}$ is the weight vector.

Conventional Beamforming (Delay-and-Sum, DAS): $\mathbf{w} = \mathbf{a}(\theta_0)/M$. This is essentially a spatial DFT — scanning all $\theta$ is equivalent to performing DFT on spatial samples.

Spatial Nyquist Theorem

d ≤ λ/2, otherwise grating lobes appear (spatial aliasing! Perfectly analogous to aliasing in the time-domain Nyquist theorem).

When d > λ/2, signals from different directions produce identical phase differences on the antenna array → indistinguishable → spatial aliasing.

Derivation: Spatial DFT and Angular Resolution

Spatial power spectrum of the DAS beamformer:

$$P_{\text{DAS}}(\theta) = \frac{1}{M^2}\left|\sum_{m=0}^{M-1} e^{j\frac{2\pi md}{\lambda}(\sin\theta - \sin\theta_0)}\right|^2$$

Let the spatial frequency be $u = d\sin\theta/\lambda$ — this is a discrete Fourier sum!

Main beam width (3dB beamwidth):

$$\Delta\theta_{3\text{dB}} \approx \frac{0.886\lambda}{Md\cos\theta_0}$$

At broadside ($\theta_0 = 0$), this simplifies to $\Delta\theta \approx 0.886\lambda/(Md)$.

More antennas (larger M) and wider spacing (larger d) → narrower beam → higher angular resolution. But d > λ/2 produces grating lobes.

How to Use

Concrete Example: 5G Massive MIMO Base Station

// Parameters
M = 64 antennas (8×8 planar array)
f_c = 28 GHz (mmWave)
λ = c / f_c = 3×10⁸ / 28×10⁹ = 10.71 mm
d = λ/2 = 5.36 mm (antenna spacing)

// Beamwidth
Δθ ≈ 0.886 × λ / (M_row × d) = 0.886 × 10.71 / (8 × 5.36) = 0.221 rad ≈ 12.7°
// (8×8 array has 8 antennas in horizontal and vertical)
// Using all 64 antennas for 2D beamforming:
// Effective aperture = 8 × 5.36mm = 42.9mm
// Beamwidth ≈ 12.7° × 12.7° (both dimensions)

// Spatial multiplexing capability
// 64 antennas can form multiple independent beams simultaneously
// Max ~M/2 = 32 users (theoretical upper limit)
// In practice 8-16 parallel users (limited by channel correlation)

// Spectral efficiency improvement
// Single user: ~5 bps/Hz
// 16-user MU-MIMO: ~80 bps/Hz (ideal case)

Advanced Methods

Capon (MVDR) Beamformer:

$$\mathbf{w}_{\text{Capon}} = \frac{\mathbf{R}^{-1}\mathbf{a}(\theta_0)}{\mathbf{a}^H(\theta_0)\mathbf{R}^{-1}\mathbf{a}(\theta_0)}$$

Minimizes output power subject to unity gain in the desired direction. Result: a much narrower beam than DAS, effectively suppressing interference.

MUSIC DOA Estimation:

Completely analogous to the frequency-domain MUSIC (see Section 3.4), just replacing "frequency" with "angle":

$$P_{\text{MUSIC}}(\theta) = \frac{1}{\mathbf{a}^H(\theta)\,\mathbf{E}_n\mathbf{E}_n^H\,\mathbf{a}(\theta)}$$

where $\mathbf{E}_n$ is the noise subspace. Peak positions = signal directions of arrival. Can resolve sources separated by much less than the beamwidth.

Number of antennas M: 16

Antenna spacing d/λ: 0.5

Adjust parameters above to see the plot

Application Scenarios

5G Massive MIMO: 64-256 antennas, mmWave beam tracking, updating beam direction every millisecond.
Radar electronic scanning (AESA): Fighter jet radars use thousands of antenna elements for electronic scanning, switching beam direction in milliseconds (mechanical rotation takes seconds).
Hearing aid beamforming: 2-4 microphone arrays for speech enhancement. In noisy environments, the beam is steered toward the speaker (SNR improvement of 5-10dB).

Pitfalls and Limitations:

Calibration errors: Inconsistencies in antenna position, gain, and phase severely degrade performance. Massive MIMO requires periodic Over-the-Air Calibration.
Capon/MUSIC requires sufficient snapshots: Estimating the covariance matrix $\mathbf{R}$ requires ≥2M snapshots for stability. Rapidly changing environments may not allow enough accumulation.
Wideband signals: The steering vector $\mathbf{a}(\theta)$ depends on frequency. Wideband signals require space-time processing (wideband beamforming), such as DFT beamforming + sub-band processing.
Near-field effects: When target distance < $2D^2/\lambda$ (D = array aperture), the plane wave assumption breaks down, and near-field beamforming is needed.

When Not to Use Array Processing:

Only one signal source and direction is not needed → a single antenna suffices.
Signal sources are confined to a narrow angular range (e.g., satellite communications) → a fixed-pointing antenna is cheaper than an array.
Alternative: If you need to separate co-frequency signals but do not care about direction, use CDMA (Code Division Multiple Access) or NOMA (Non-Orthogonal Multiple Access).

✅ Quick Check

Q1: What happens when ULA antenna spacing d > λ/2?

Show answer

Grating lobes — which are spatial aliasing. Analogous to aliasing caused by insufficient sampling rate in the time domain.

Q2: 5G massive MIMO with 64 antennas @28GHz (λ≈10.7mm) — approximately what is the beamwidth?

Show answer

≈ λ/(M·d) ≈ 10.7/(64×5.35) ≈ 0.03 rad ≈ 1.8°(assuming d=λ/2=5.35mm).

← 6.7 Radar 6.9 Biomedical →

6.9 Biomedical Signal Analysis

Learning Objectives

List the five major EEG frequency bands and explain their physiological significance
Estimate EEG band power ratios using the Welch method
Describe the standard workflow for HRV frequency-domain analysis (R-R intervals → resampling → PSD → LF/HF)

Why does this matter? Because clinical interpretation of EEG/ECG increasingly relies on frequency-domain quantitative metrics — this is an essential skill for entering biomedical engineering.

Previously:6.8 dealt with man-made signals. Biomedical signals (brainwaves, ECG) are also important application scenarios for frequency-domain analysis.

Pain Point: Clinicians need objective quantitative metrics to diagnose epilepsy (abnormal brainwaves), sleep disorders (band power changes), and autonomic dysfunction (HRV abnormalities). Visually inspecting time-domain waveforms is too subjective and too difficult — frequency-domain tools are needed.

Historical Context:

Hans Berger, 1929: First recorded human EEG (Electroencephalography), discovering the 8-13Hz α wave (Alpha rhythm) that appears when eyes are closed.
HRV frequency-domain analysis standardization: The European Society of Cardiology (ESC) and the North American Society of Pacing and Electrophysiology (NASPE) published the Task Force report in 1996, defining the LF/HF frequency bands and analysis methods for HRV.

Principles

Five Major EEG Frequency Bands

Band	Frequency Range	Physiological Significance	Clinical Application
δ (Delta)	0.5 – 4 Hz	Deep sleep (N3 stage)	Sleep depth assessment, brain injury detection
θ (Theta)	4 – 8 Hz	Light sleep, meditation, memory encoding	Sleep staging (N1/N2), attention index
α (Alpha)	8 – 13 Hz	Relaxed wakefulness, eyes closed	Alertness assessment, BCI control
β (Beta)	13 – 30 Hz	Focus, alertness, active thinking	Attention monitoring, anxiety assessment
γ (Gamma)	30 – 100 Hz	Higher cognition, cross-regional integration	Epileptic high-frequency oscillation detection

Analysis methods:

Welch PSD (see Section 3.2) → integrate power in each band → power ratios
For example, α/θ ratio = attention index (higher = more focused)
Relative power: $P_\alpha^{\text{rel}} = P_\alpha / P_{\text{total}}$

HRV Frequency-Domain Analysis

Heart Rate Variability (HRV) reflects the activity of the Autonomic Nervous System.

Band	Frequency Range	Primary Regulation
VLF (Very Low Frequency)	0.003 – 0.04 Hz	Thermoregulation, renin-angiotensin system
LF (Low Frequency)	0.04 – 0.15 Hz	Sympathetic + parasympathetic (not purely sympathetic!)
HF (High Frequency)	0.15 – 0.4 Hz	Parasympathetic (respiratory sinus arrhythmia)

Analysis workflow: R-R interval series (Tachogram) → resample (non-uniform → uniform, typically cubic spline @4Hz) → PSD → band power integration.

Derivation: Why Does HRV Require Resampling?

The R-R interval series is inherently non-uniformly sampled (because heartbeat intervals are not fixed). Fourier analysis requires uniform sampling.

Solution:

Place each R-R interval value at its corresponding time point (non-uniform sequence).
Use cubic spline interpolation to generate a uniform sequence (typically 4Hz = one point every 0.25 seconds).
The Nyquist frequency at 4Hz sample rate = 2Hz, sufficient to cover the HF band (0.4Hz).

Alternative: The Lomb-Scargle periodogram can directly handle non-uniform data, but Welch PSD is more widely used in clinical practice.

How to Use

EEG Analysis Steps

// Step 1: Signal acquisition
// Sample rate: 250-1000 Hz (clinical standard: 256 or 512 Hz)
// Electrodes: International 10-20 system (19-64 channels)
// ADC resolution: 24-bit (EEG amplitude is only ~10-100 μV)

// Step 2: Preprocessing
band_pass_filter(0.5, 100)  // Remove DC drift and high-frequency noise
notch_filter(60)            // Remove power line interference (60Hz)
ICA_artifact_removal()      // Independent component analysis to remove eye/muscle artifacts

// Step 3: Spectral analysis (Welch method)
segment_length = 2 * fs     // 2-second segments (512 points @256Hz)
overlap = 0.5               // 50% overlap
window = 'hann'
PSD = welch(eeg_channel, segment_length, overlap, window)

// Step 4: Band power computation
P_delta = integrate(PSD, 0.5, 4)    // δ power
P_theta = integrate(PSD, 4, 8)      // θ power
P_alpha = integrate(PSD, 8, 13)     // α power
P_beta  = integrate(PSD, 13, 30)    // β power
P_gamma = integrate(PSD, 30, 100)   // γ power
P_total = P_delta + P_theta + P_alpha + P_beta + P_gamma

// Step 5: Compute metrics
attention_index = P_alpha / P_theta  // α/θ ratio
relative_alpha = P_alpha / P_total   // relative α power

// Sleep staging example:
// Awake (eyes closed): α dominant (relative_alpha > 0.4)
// N1 light sleep:      θ increases, α decreases
// N2:                  Sleep spindles (12-14Hz bursts) + K-complex
// N3 deep sleep:       δ dominant (relative_delta > 0.5, amplitude > 75μV)
// REM:                 Low-amplitude mixed frequency (similar to awake, but with rapid eye movements)

HRV Analysis Steps

// Step 1: R-wave detection (Pan-Tompkins Algorithm)
ecg_filtered = bandpass(ecg, 5, 15)  // Focus on QRS complex
ecg_diff = differentiate(ecg_filtered)
ecg_squared = ecg_diff ** 2
ecg_integrated = moving_average(ecg_squared, 150ms)
R_peaks = adaptive_threshold(ecg_integrated)

// Step 2: Compute R-R intervals (Tachogram)
RR_intervals = diff(R_peaks) / fs  // units: seconds
RR_times = cumsum(RR_intervals)    // corresponding time points

// Step 3: Remove ectopic beats
for i in range(len(RR)):
    if abs(RR[i] - RR[i-1]) / RR[i-1] > 0.20:
        RR[i] = interpolate(neighbors)  // interpolate to replace

// Step 4: Resample to uniform spacing
fs_resample = 4  // 4 Hz
RR_resampled = cubic_spline_interpolate(RR_times, RR_intervals, fs_resample)
RR_resampled -= mean(RR_resampled)  // remove DC

// Step 5: Welch PSD
segment_length = 256  // 256 points @4Hz = 64 seconds
overlap = 0.5
PSD_hrv = welch(RR_resampled, segment_length, overlap, 'hann')

// Step 6: Band power
LF_power = integrate(PSD_hrv, 0.04, 0.15)  // ms²
HF_power = integrate(PSD_hrv, 0.15, 0.40)  // ms²
LF_HF_ratio = LF_power / HF_power

// Normal reference values:
// Healthy young adults: LF/HF ≈ 1.0-2.0
// Stress/anxiety:       LF/HF ↑ (sympathetic activation)
// Athletes at rest:     HF ↑, LF/HF ↓ (parasympathetic dominant)
// Heart failure:        Total power ↓↓, LF/HF may be abnormally high or low

Adjust parameters above to see the plot

Application Scenarios

Brain-Computer Interface (BCI): Detects μ (8-12Hz) / β (18-26Hz) rhythm changes caused by motor imagery, controlling wheelchairs or robotic arms. Classification accuracy 70-90%.
Sleep monitoring wearables: Using single-channel frontal EEG (e.g., Muse headband), computing δ/θ/α power for automatic sleep staging. Sample rate 256Hz, Welch PSD every 30-second epoch.
Cardiac autonomic assessment: 5-minute short-term HRV analysis for stress assessment, athletic training monitoring, and diabetic autonomic neuropathy screening. Apple Watch/Garmin watches have this feature built in.

Pitfalls and Limitations:

EEG artifacts dominate the spectrum: Eye movements (EOG) produce large-amplitude 0-4Hz interference → misidentified as δ activity. Muscle activity (EMG) contaminates >20Hz. 50/60Hz power line interference. Artifacts must be removed first.
LF ≠ sympathetic activity: This is the most common misconception! The LF band is modulated by both sympathetic and parasympathetic nervous systems (baroreflex mechanism). Only HF more purely reflects parasympathetic activity. The LF/HF ratio is only a rough indicator of "sympathovagal balance."
Short-segment HRV is unreliable: VLF requires at least 5 minutes of data (otherwise insufficient frequency resolution). LF requires at least 2 minutes. 24-hour long-term analysis is more stable.
Breathing rate affects HF: If the subject breathes very slowly (<0.15Hz, e.g., during meditation), the respiratory component falls in the LF band instead of HF → LF/HF ratio is distorted. Breathing rate must be simultaneously recorded.

When Not to Use Frequency-Domain Analysis:

Transient events in EEG (e.g., epileptic spikes, lasting <200ms) → frequency-domain methods cannot localize in time. Use time-frequency analysis (STFT, wavelet).
HRV during cardiac arrhythmia (e.g., atrial fibrillation) → RR intervals are completely irregular, frequency-domain analysis is meaningless. Use nonlinear methods (approximate entropy, Poincare plot).
Alternative: Multiscale Entropy analysis, Recurrence Plot, deep learning automatic feature extraction.

✅ Quick Check

Q1: Which EEG frequency band is enhanced when the eyes are closed?

Show answer

Alpha waves (8-13 Hz) — this is the Berger effect.

Q2: What does the LF/HF ratio in HRV frequency-domain analysis represent?

Show answer

Often interpreted as a sympathetic/parasympathetic balance indicator, but this is actually an oversimplification — LF is modulated by both sympathetic and parasympathetic activity.

Interactive: EEG Sleep Staging Exercise

The system generates a simulated EEG power spectrum. Determine the sleep stage based on the power distribution across frequency bands.

Awake — strong α(8-13Hz) + β(13-30Hz) N1 Light Sleep — θ(4-8Hz) increases, α decreases N2 — Sleep spindles (12-14Hz), K-complex N3 Deep Sleep — dominant δ(0.5-4Hz), very little α/β REM — low-amplitude mixed frequency, similar to awake but more θ

← 6.8 Array Processing 6.10 Vibration →

6.10 Complete Vibration Analysis Workflow

Learning Objectives

Describe the complete 6-step workflow from sensor to fault diagnosis in vibration analysis
Compute BPFO/BPFI characteristic frequencies for given bearing parameters
Distinguish spectral signatures of imbalance (1X), misalignment (2X), looseness (multi-harmonic), and bearing faults (BPFO)

Why does this matter? Because Predictive Maintenance saves industry billions of dollars annually, and its core is vibration spectrum analysis.

Previously:6.9 analyzed human body signals. This final chapter ties all tools together, demonstrating the complete end-to-end vibration analysis workflow from sensor to fault diagnosis.

Pain Point: A factory has 200 rotating machines (pumps, motors, compressors) — how do you know which one is about to fail? Disassembling for inspection is too expensive; waiting until it breaks down is even more expensive (a single unplanned shutdown can cost hundreds of thousands to millions of dollars). → Vibration spectrum analysis = a non-invasive "stethoscope."

Historical Context:

ISO 10816 / ISO 20816: Defines vibration severity levels (A: Good → D: Dangerous), the international standard for industrial vibration monitoring.
Robert Randall and Jérôme Antoni: Systematized the Envelope Spectrum Analysis method, particularly the band selection strategy (Spectral Kurtosis) for bearing fault diagnosis.

Complete Vibration Analysis Workflow

Step 1: Signal Acquisition

Sensor: Accelerometer, typically IEPE/ICP type (built-in amplifier, single coaxial cable output).
Sample rate: At least 2.56× the highest frequency of interest (ISO recommends including anti-aliasing filter roll-off).
- General rotating machinery: f_s = 25.6 kHz (covering up to 10 kHz)
- Bearing diagnostics: f_s = 51.2 kHz (high-frequency resonance band needed)
- Gearbox analysis: f_s > 50 kHz (mesh frequency can be very high)
Anti-Aliasing Filter: Analog low-pass filter before the ADC, cutoff frequency set at 0.4×f_s.
Mounting: Stud mount > magnet > adhesive > handheld probe. Mounting quality directly affects high-frequency response.

Step 2: Preprocessing

// DC offset removal
signal -= mean(signal)

// High-pass filter: remove < 5Hz low-frequency interference
// (inertial base vibration, loose sensor mounting, temperature drift)
signal = highpass(signal, fc=5, order=4)

// Bandpass filter when needed
// (focus on frequency band of interest, e.g., around gear mesh frequency)
signal_bp = bandpass(signal, f_low, f_high, order=6)

Step 3: Basic Spectrum Analysis

// Windowed FFT + Welch averaging
nfft = 8192          // Frequency resolution = fs/nfft = 25600/8192 = 3.125 Hz
window = 'hann'
n_averages = 16      // 16-segment averaging, reduces random variance
overlap = 0.5        // 50% overlap
PSD = welch(signal, nfft, overlap, window, n_averages)

Identifying Characteristic Frequencies:

Fault Type	Characteristic Frequency	Spectral Signature
Imbalance	1X = shaft speed	High 1X amplitude, primarily radial
Misalignment	2X = 2×shaft speed	High 2X amplitude, may include axial vibration
Mechanical Looseness	Multiple harmonics (1X, 2X, 3X...)	Multiple shaft speed harmonics, possibly 0.5X sub-harmonic
Gear Fault	Mesh frequency = tooth count×speed	Sidebands ±1X around mesh frequency
Blade/Vane Issues	Blade pass frequency = blade count×speed	Fluid pulsation in pumps/fans
Bearing Fault	BPFO/BPFI/BSF/FTF	Requires envelope spectrum analysis (see Step 4)

Step 4: Envelope Spectrum Analysis (Bearing Fault Diagnosis)

Intuition: Each time a bearing defect passes through the load zone, it produces a brief impact that excites the high-frequency structural resonance of the bearing housing (2-10 kHz). These impacts repeat at the bearing fault frequency. Directly viewing the spectrum only shows high-frequency resonance, not the periodicity of the fault frequency. Envelope analysis = bandpass extract the resonance band → Hilbert envelope → FFT of the envelope → look for fault frequencies in the envelope spectrum.

// Step 4a: Bandpass filter (focus on high-frequency resonance band)
// Band selection: use Spectral Kurtosis (SK) to automatically find optimal band
// or empirically select 2-10 kHz
signal_bp = bandpass(signal, 2000, 10000, order=6)

// Step 4b: Hilbert envelope
analytic = hilbert(signal_bp)
envelope = abs(analytic)

// Step 4c: Envelope FFT
envelope -= mean(envelope)  // remove DC
envelope_spectrum = fft(envelope * hann_window)

// Step 4d: Search for bearing characteristic frequencies in envelope spectrum
// Peaks at BPFO and its harmonics (2×BPFO, 3×BPFO...) → outer race fault
// Peaks at BPFI and its harmonics, modulated by 1X → inner race fault
// Peaks at BSF and its harmonics → rolling element fault
// Peaks at FTF and its harmonics → cage fault

Step 5: Time-Frequency Analysis (Variable Speed Conditions)

If the rotational speed is not constant (run-up, coast-down, load changes), characteristic frequencies change over time. STFT or Order Tracking is needed.

STFT: Observe frequency changes over time (colorful waterfall plot).
Order Tracking: Synchronous sampling using a tachometer, converting the time axis to "angle" → characteristic frequencies become fixed "orders."

Step 6: Trend Monitoring

// Acquire vibration data daily (or weekly)
// Track trends of key metrics over time:
// - Overall RMS vibration level (mm/s)
// - RMS in specific frequency bands (e.g., 1X, BPFO band)
// - Peak and Crest Factor

// Alarm threshold settings:
// Method 1: ISO 20816 absolute thresholds
//   Zone A (< 2.8 mm/s): Good
//   Zone B (2.8-7.1 mm/s): Acceptable
//   Zone C (7.1-18 mm/s): Requires attention
//   Zone D (> 18 mm/s): Dangerous, immediate shutdown
// Method 2: Baseline relative thresholds
//   Warning: Baseline + 6 dB (2x)
//   Alarm:   Baseline + 12 dB (4x)

// Sudden trend increase → schedule maintenance (planned shutdown)
// 10-100x cheaper than unplanned downtime

Bearing Characteristic Frequencies

\text{BPFO} = \frac{n}{2} f_r \left(1 - \frac{d}{D}\cos\alpha\right) \quad \text{(outer race fault frequency)}

\text{BPFI} = \frac{n}{2} f_r \left(1 + \frac{d}{D}\cos\alpha\right) \quad \text{(inner race fault frequency)}

\text{BSF} = \frac{D}{2d} f_r \left[1 - \left(\frac{d}{D}\cos\alpha\right)^2\right] \quad \text{(ball spin frequency)}

\text{FTF} = \frac{1}{2} f_r \left(1 - \frac{d}{D}\cos\alpha\right) \quad \text{(cage fault frequency)}

where: n = number of rolling elements, f_r = shaft speed (Hz), d = rolling element diameter, D = pitch diameter, α = contact angle.

Concrete Example: SKF 6205 Bearing

// SKF 6205 deep groove ball bearing parameters
n = 9           // number of balls
d = 7.94 mm     // ball diameter
D = 38.5 mm     // pitch diameter
α = 0°          // contact angle (deep groove ball bearing)
cos(α) = 1

// Shaft speed: 1800 RPM = 30 Hz
f_r = 30 Hz

// BPFO (outer race fault frequency)
BPFO = (9/2) × 30 × (1 - 7.94/38.5) = 4.5 × 30 × 0.7938
     = 4.5 × 30 × 0.7938 = 107.2 Hz

// BPFI (inner race fault frequency)
BPFI = (9/2) × 30 × (1 + 7.94/38.5) = 4.5 × 30 × 1.2062
     = 4.5 × 30 × 1.2062 = 162.8 Hz

// BSF (ball spin frequency)
BSF = (38.5 / (2×7.94)) × 30 × [1 - (7.94/38.5)²]
    = 2.424 × 30 × [1 - 0.04253]
    = 2.424 × 30 × 0.9575 = 69.6 Hz

// FTF (cage fault frequency)
FTF = (1/2) × 30 × (1 - 7.94/38.5) = 15 × 0.7938
    = 11.9 Hz

// Diagnostic logic:
// Envelope spectrum peaks at 107.2, 214.4, 321.6 Hz → outer race fault
// Envelope spectrum peaks at 162.8, 325.6 Hz with ±30Hz sidebands → inner race fault
// Envelope spectrum peaks at 69.6, 139.2 Hz → ball fault
// Envelope spectrum peaks at 11.9, 23.8 Hz → cage fault

Application Scenarios

Petrochemical plant pump monitoring: Centrifugal pumps are the most numerous rotating machines in petrochemical plants. Vibration monitoring provides early warning of imbalance, misalignment, and bearing wear. Each pump has 2-3 accelerometers installed (horizontal, vertical, axial), with online monitoring systems acquiring data every second.
Wind turbine main bearing: A single main bearing can cost millions of dollars, and replacement requires a large crane (even more costly). Vibration + AE (Acoustic Emission) monitoring detects early damage, providing 3-6 months advance warning.
CNC spindle: Bearing faults in high-speed spindles (30,000-60,000 RPM) directly affect machining precision. Vibration monitoring is used for cutting force monitoring, tool wear detection, and spindle health management.

Pitfalls and Limitations:

Poor sensor mounting: Handheld probes have high-frequency response only to 1-2 kHz, magnets to 3-5 kHz, stud mounts to 10-20 kHz. Envelope analysis requires high frequencies → mounting quality is critical.
Unstable speed: Speed variations cause spectral peaks to smear, reducing frequency resolution. Variable frequency drive motors are especially problematic. Solution: speed-synchronized sampling (order tracking).
Looking only at overall RMS without the spectrum: Overall RMS indicates "severity" but cannot distinguish fault types. High 1X may be imbalance, high 2X may be misalignment — the corrective actions are completely different.
False positive: seeing BPFO does not necessarily mean bearing failure: Confirm whether modulation is present (envelope spectrum shows BPFO harmonics + modulated by shaft speed). Some structural resonance frequencies may coincidentally be near BPFO, causing misdiagnosis.

When Not to Use FFT Vibration Analysis:

Very low-speed machines (<10 RPM) → acceleration signal is too weak. Use proximity probes or AE instead.
Non-rotating machines (e.g., pressure vessels, piping) → no clear rotational frequency. Use AE or guided wave.
Transient events (e.g., gear tooth breakage) → steady-state FFT cannot capture these. Use time-frequency analysis or statistical indicators (kurtosis, crest factor).
Alternative: Machine learning (ML) automatic feature extraction is replacing some traditional rule-based diagnostics. However, ML model training still requires FFT features as input — the two are complementary, not replacements.

📝 Worked Example

Pump speed 1500RPM, gear tooth count Z=23. (a) 1X frequency? (b) Mesh frequency? (c) 2-second measurement, f_s=25.6kHz, FFT with 4096-point Hann window, Δf? (d) Can you resolve the ±f_r modulation sidebands near 1X?

Show solution

(a) f_r = 1500/60 = 25Hz

(b) GMF = 23×25 = 575Hz

(c) Δf = 25600/4096 = 6.25Hz

(d) Sidebands at 575±25Hz = 550Hz and 600Hz, spacing 25Hz > Δf=6.25Hz → yes, resolvable

✅ Quick Check

Q1: What is the BPFO of a SKF 6205 bearing (9 balls, d=7.94mm, D=38.5mm) at 1800 RPM?

Show answer

fr=30Hz, BPFO = (9/2)×30×(1-7.94/38.5) ≈ 107.1 Hz.

Q2: Why can't vibration analysis rely solely on overall RMS?

Show answer

Because RMS only reflects total energy and cannot distinguish fault types. For example, imbalance (1X) and bearing defects (BPFO) may have similar RMS values, but their spectra are completely different.

Interactive: Vibration Fault Diagnosis Exercise

The system randomly generates a vibration spectrum. Determine the fault type based on spectral characteristics.

Imbalance (1X dominant) Misalignment (2X dominant) Mechanical Looseness (multiple harmonics + sub-harmonics) Outer Race Bearing Fault (BPFO)

← 6.9 Biomedical Laboratory →

2b.1 Discrete-Time Signals

Stepping from the continuous world into the discrete world — the starting point of digital signal processing

One-Sentence Summary: A discrete-time signal is a sequence $x[n]$ defined only at integer time indices $n$; these sequences are the fundamental "atoms" on which all DSP operations are built.

Learning Objectives

Identify the five fundamental sequences: unit impulse $\delta[n]$, unit step $u[n]$, exponential sequence, sinusoidal sequence, and complex exponential sequence
Classify signals as Energy Signals or Power Signals
Perform sequence operations: time shift, folding, and amplitude scaling
Understand the periodicity condition for discrete sinusoids: $f_0/f_s$ must be rational
Decompose any sequence into its even and odd components (Even/Odd Decomposition)

Why Learn This: Every DSP algorithm — filters, FFT, modulation/demodulation — ultimately operates on discrete-time sequences. If you do not understand the sifting property of $\delta[n]$, or do not realize that a discrete sinusoid is "not necessarily periodic," you will hit roadblocks when studying the DFT and Z-transform later. This chapter is your "alphabet."

Previously: In the previous section (2a.2 CTFT) we dealt with continuous-time signals $x(t)$ and their spectra $X(f)$. Now we "sample" — keeping only the values at integer time instants — and enter the discrete-time world. The continuous-time Dirac delta $\delta(t)$ becomes the Kronecker delta $\delta[n]$, and integrals become summations.

Pain Point: What Exactly Is Different Between Continuous and Discrete?

When jumping from continuous time to discrete time, many "intuitions" break down:

$\cos(\omega_0 n)$ is not necessarily periodic! The continuous version $\cos(\omega_0 t)$ always has period $2\pi/\omega_0$, but the discrete version is periodic only when $\omega_0/(2\pi)$ is rational.
Frequency has an upper limit: The frequency of a discrete signal is meaningful only in $[0, \pi]$ (or $[0, f_s/2]$); $\omega = \pi$ is the Nyquist frequency.
Exponential sequences can blow up: $\alpha^n u[n]$ diverges when $|\alpha|>1$, giving infinite energy — the continuous world has an analog, but numerical issues arise more easily in the discrete version.

Historical Context: The formalization of discrete-time signals began in the 1940s–50s. Claude Shannon's (1916–2001) sampling theorem (1949) built the bridge between continuous and discrete worlds, and the Z-transform introduced by Ragazzini and Zadeh (1952) brought discrete system analysis to maturity. What truly popularized DSP was the 1965 Cooley–Tukey FFT algorithm, which moved frequency-domain analysis of discrete sequences from theory to practice.

Core Concepts: The Five Fundamental Sequences

Intuition First: Just as chemistry has the periodic table, DSP has its "elements" — these five fundamental sequences. Any discrete signal can be expressed as a linear combination of $\delta[n]$ (this is the foundation of convolution).

Sequence	Definition	Properties
Unit Impulse $\delta[n]$	$\delta[n] = \begin{cases}1, & n=0\\0, & n\neq 0\end{cases}$	Sifting property: $x[n]\cdot\delta[n-k] = x[k]\cdot\delta[n-k]$
Unit Step $u[n]$	$u[n] = \begin{cases}1, & n\geq 0\\0, & n < 0\end{cases}$	$u[n] = \sum_{k=0}^{\infty}\delta[n-k]$
Exponential Sequence	$x[n] = \alpha^n u[n]$	$\|\alpha\|<1$: decaying; $\|\alpha\|>1$: growing
Sinusoidal Sequence	$x[n] = A\cos(\omega_0 n + \phi)$	Period $N$ exists $\iff \omega_0/(2\pi) \in \mathbb{Q}$
Complex Exponential Sequence	$x[n] = e^{j\omega_0 n}$	$e^{j(\omega_0+2\pi)n} = e^{j\omega_0 n}$ ($2\pi$ periodicity)

Key Relationship

Decomposition of any sequence via $\delta[n]$:

$$x[n] = \sum_{k=-\infty}^{\infty} x[k]\,\delta[n-k]$$

This is the prototype of convolution $x[n] * \delta[n] = x[n]$, and the starting point for LTI system theory in the next chapter.

Energy Signal vs. Power Signal

Energy:

$$E = \sum_{n=-\infty}^{\infty} |x[n]|^2$$

Average Power:

$$P = \lim_{N\to\infty} \frac{1}{2N+1}\sum_{n=-N}^{N} |x[n]|^2$$

Type	Condition	Example
Energy Signal	$0 < E < \infty$, $P = 0$	$\alpha^n u[n]$ ($\|\alpha\|<1$), $\delta[n]$
Power Signal	$E = \infty$, $0 < P < \infty$	$\cos(\omega_0 n)$, $u[n]$
Neither	$E = \infty$, $P = \infty$	$2^n u[n]$ (exponential growth)

Expand: Energy computation for $\alpha^n u[n]$

Let $x[n] = \alpha^n u[n]$ with $|\alpha| < 1$:

$$E = \sum_{n=0}^{\infty} |\alpha^n|^2 = \sum_{n=0}^{\infty} |\alpha|^{2n} = \frac{1}{1-|\alpha|^2}$$

Since $|\alpha|^2 < 1$, the geometric series converges. For example, $\alpha = 0.8$: $E = 1/(1-0.64) = 2.778$. $\;\blacksquare$

Sequence Operations: Time Shift, Folding, and Scaling

Time Shift: $y[n] = x[n-n_0]$

$n_0 > 0$: delay (shift right) by $n_0$ samples
$n_0 < 0$: advance (shift left) by $|n_0|$ samples

Folding (Time Reversal): $y[n] = x[-n]$, mirror-reflected about $n=0$

Amplitude Scaling: $y[n] = c \cdot x[n]$

Caution: In discrete time there is no simple analog of "time compression $x[2n]$"! $x[2n]$ means downsampling, which causes information loss (aliasing). This is fundamentally different from $x(2t)$ in continuous time.

Periodicity of Discrete Sinusoids

$x[n] = \cos(\omega_0 n)$ has period $N$ ($x[n+N]=x[n]$) if and only if:

$$\omega_0 N = 2\pi m \quad\Rightarrow\quad \frac{\omega_0}{2\pi} = \frac{m}{N} \in \mathbb{Q}$$

That is, the normalized frequency $\omega_0/(2\pi)$ must be a rational number. The minimum period is $N = m/\gcd(m,N_0)$ where $\omega_0 = 2\pi N_0/m$.

Examples:

$\cos(0.3\pi n)$: $0.3\pi/(2\pi) = 0.15 = 3/20$ is rational → period $N=20$
$\cos(n)$: $1/(2\pi) \approx 0.1592...$ is irrational → aperiodic
$\cos(\pi n) = (-1)^n$: $\pi/(2\pi) = 1/2$ → period $N=2$

Even/Odd Decomposition

Any real-valued sequence $x[n]$ can be uniquely decomposed as:

$$x[n] = x_e[n] + x_o[n]$$ $$x_e[n] = \frac{x[n]+x[-n]}{2} \quad(\text{Even Part})$$ $$x_o[n] = \frac{x[n]-x[-n]}{2} \quad(\text{Odd Part})$$

Properties: $x_e[-n] = x_e[n]$ (even symmetry), $x_o[-n] = -x_o[n]$ (odd symmetry), and $x_o[0] = 0$.

How to Use: Worked Examples

Worked Example: Determine whether $x[n] = (0.9)^n u[n]$ is an energy signal or a power signal

Compute the energy: $E = \sum_{n=0}^{\infty} (0.9)^{2n} = \sum_{n=0}^{\infty} (0.81)^n = \frac{1}{1-0.81} = 5.263$
$E < \infty$, so it is an energy signal
Average power $P = 0$ (the power of an energy signal is always zero)

Worked Example: Determine the period of $\cos(0.4\pi n)$

Normalized frequency $f_0 = 0.4\pi/(2\pi) = 0.2 = 1/5$
$1/5$ is rational, so the sequence is periodic
Minimum period $N = 5$ ($\cos(0.4\pi(n+5)) = \cos(0.4\pi n + 2\pi) = \cos(0.4\pi n)$)

Applications

Audio Coding: A CD sampling rate of $f_s=44.1$ kHz produces 44,100 discrete samples per second. Understanding the energy characteristics of sequences determines the bit-allocation strategy for quantization.
Radar Pulse Compression: The transmitter generates a complex-exponential chirp sequence; the receiver performs matched filtering (essentially convolution) to detect targets.
Communication Synchronization: The sifting property of $\delta[n]$ is used to design pilot sequences, and cross-correlation is used to estimate timing offsets.

Pitfalls and Limitations

Discrete-time $\neq$ digital: The amplitude of $x[n]$ is still a continuous value (real number). A truly digital signal also requires quantization, which introduces quantization noise.
$x[2n]$ is not "speed-up playback": In discrete time, $x[2n]$ means downsampling, which destroys spectral content (aliasing). Do not draw an analogy with the continuous-time $x(2t)$.
$\cos(n)$ has no period: Many beginners assume all sinusoids are periodic. A discrete sinusoid is periodic only when $\omega_0/(2\pi)$ is rational.

Quick Check

Q1: Is $x[n] = 5\cos(0.6\pi n)$ a periodic sequence? If so, what is the minimum period?

Show answer

$\omega_0/(2\pi) = 0.6\pi/(2\pi) = 0.3 = 3/10$, which is rational, so it is a periodic sequence. We need $0.6\pi \cdot N = 2\pi m$, i.e., $N = 10m/3$. The smallest positive integer $N$: set $m=3$, giving $N=10$. The minimum period is $\mathbf{10}$.

Q2: Is $x[n] = (-1)^n$ an energy signal or a power signal?

Show answer

$|x[n]|^2 = 1$ for all $n$, so $E = \sum_{-\infty}^{\infty} 1 = \infty$. $P = \lim_{N\to\infty}\frac{1}{2N+1}\sum_{-N}^{N} 1 = 1$. Since $0 < P < \infty$, it is a power signal. (Note that $(-1)^n = \cos(\pi n)$, a sinusoidal sequence.)

Interactive: Fundamental Discrete-Time Sequences

Select a signal type and adjust the parameters to observe the stem plot of the discrete-time sequence.

Signal Type

α = 0.85

ω₀ = 0.30π

References: [1] Oppenheim & Schafer, Discrete-Time Signal Processing, 3rd ed., Ch.2. [2] Proakis & Manolakis, Digital Signal Processing, 4th ed., Ch.2. [3] Haykin & Van Veen, Signals and Systems, Ch.6.

Interactive: Discrete-Time Signal Stem Plot

Pick a basic discrete sequence. The stem plot (vertical lines + dots) is the standard visualization for discrete-time signals.

Sequence:

← 2a.2 CTFT 2b.2 LTI Systems & Convolution →

2b.2 LTI Systems & Convolution

A single impulse response fully describes an entire system — the most elegant result in DSP

One-Sentence Summary: If a system is Linear and Time-Invariant (LTI), then knowing only its response to $\delta[n]$ — the impulse response $h[n]$ — is enough to compute the output for any input via convolution $y[n] = x[n] * h[n]$.

Learning Objectives

Define Linearity and Time-Invariance, and determine whether a given system satisfies them
Derive the Convolution Sum $y[n] = \sum_k x[k]\,h[n-k]$
Understand how the impulse response $h[n]$ completely characterizes an LTI system
Apply the commutative, associative, and distributive properties of convolution
Establish criteria for Causality and BIBO Stability

Why Learn This: Convolution is the most central operation in DSP. Every time you use an FIR filter, create a reverb effect, or compute cross-correlations for radar target detection, convolution is at work. Understanding LTI theory tells you: Why is a single impulse-response measurement sufficient? Why is the frequency response the DTFT of $h[n]$? Why does cascading two filters equal convolving their impulse responses?

Previously: In the previous section we learned to represent any signal as $x[n] = \sum_k x[k]\,\delta[n-k]$ — a weighted sum of delayed impulses. The key question now is: if we know the system's response to $\delta[n]$, can we compute its response to $x[n]$? The answer is yes, provided the system is LTI.

Pain Point: Why Can't We Test Every Input Individually?

Imagine you have designed a filter and want to know how it behaves for every possible input:

The input space is infinite-dimensional — you cannot exhaustively test all $x[n]$
If the system is nonlinear (e.g., $y[n] = x[n]^2$), knowing the response to $\delta[n]$ is entirely insufficient
The LTI assumption is the "master key": one test $\to$ complete characterization

Historical Context: The concept of convolution traces back to the integral operations of Euler and Laplace in the 18th century. In the 1930s, Norbert Wiener used convolution extensively in the theory of stochastic processes. Discrete convolution became a standard signal processing tool in the 1960s with the rise of digital computers. The invention of the FFT in 1965 made "fast convolution" possible — replacing $O(N^2)$ direct computation with $O(N\log N)$ FFT-based convolution.

Core Concepts: What Is an LTI System?

Intuition First: Think of a black box $\mathcal{T}\{\cdot\}$ that takes in a sequence and produces a sequence.

Linearity = Homogeneity + Additivity (Superposition):

$$\mathcal{T}\{a\,x_1[n] + b\,x_2[n]\} = a\,\mathcal{T}\{x_1[n]\} + b\,\mathcal{T}\{x_2[n]\}$$

Time-Invariance:

$$\text{If } \mathcal{T}\{x[n]\} = y[n], \text{ then } \mathcal{T}\{x[n-n_0]\} = y[n-n_0]$$

The system's response to a delayed input equals the delayed output.

How to Determine: Common Examples

System	Linear?	Time-Invariant?	LTI?
$y[n] = 3x[n] + 2x[n-1]$	Yes	Yes	Yes
$y[n] = x[n]^2$	No	Yes	No
$y[n] = n \cdot x[n]$	Yes	No	No
$y[n] = x[n] + 1$	No	Yes	No
$y[n] = x[-n]$	Yes	No	No

Derivation of the Convolution Sum

Intuition: The input $x[n]$ is a weighted sum of delayed impulses. Linearity + time-invariance of an LTI system means each impulse's response is also weighted, delayed, and superposed.

Expand: Full derivation of the convolution sum

Step 1: Decompose any signal $x[n]$ as a weighted sum of delayed impulses:

$$x[n] = \sum_{k=-\infty}^{\infty} x[k]\,\delta[n-k]$$

Step 2: Define the impulse response $h[n] = \mathcal{T}\{\delta[n]\}$. By time-invariance:

$$\mathcal{T}\{\delta[n-k]\} = h[n-k]$$

Step 3: By linearity (superposition):

$$y[n] = \mathcal{T}\{x[n]\} = \mathcal{T}\left\{\sum_{k} x[k]\,\delta[n-k]\right\} = \sum_{k} x[k]\,\mathcal{T}\{\delta[n-k]\} = \sum_{k} x[k]\,h[n-k]$$

Conclusion:

$$\boxed{y[n] = \sum_{k=-\infty}^{\infty} x[k]\,h[n-k] \;\equiv\; x[n] * h[n]}$$

$\;\blacksquare$

Convolution Sum

$$y[n] = x[n] * h[n] = \sum_{k=-\infty}^{\infty} x[k]\,h[n-k]$$

"Flip, slide, multiply, sum" — these four steps are the recipe for hand-computing convolution.

Properties of Convolution

Property	Formula	Engineering Significance
Commutative	$x * h = h * x$	Input and impulse response roles are interchangeable
Associative	$(x * h_1) * h_2 = x * (h_1 * h_2)$	Cascaded filters = convolution of impulse responses
Distributive	$x * (h_1 + h_2) = xh_1 + xh_2$	Parallel filters = sum of impulse responses
Identity Element	$x * \delta = x$	$\delta[n]$ is the "1" of convolution

Causality and BIBO Stability

Causal System: The output depends only on the present and past inputs.

$$\text{LTI Causal} \iff h[n] = 0 \text{ for } n < 0$$

BIBO Stability (Bounded-Input Bounded-Output): Bounded input → bounded output.

$$\text{LTI BIBO Stable} \iff \sum_{n=-\infty}^{\infty} |h[n]| < \infty$$

The impulse response must be absolutely summable.

Expand: Proof of BIBO stability

Sufficiency ($\Leftarrow$): If $\sum|h[n]| < \infty$ and $|x[n]| \leq B_x$, then:

$$|y[n]| = \left|\sum_k x[k]\,h[n-k]\right| \leq \sum_k |x[k]|\,|h[n-k]| \leq B_x \sum_k |h[k]| < \infty$$

Necessity ($\Rightarrow$): If $\sum|h[n]| = \infty$, construct the bounded input $x[n] = \text{sgn}(h[-n])$. Then $|x[n]| \leq 1$ but $y[0] = \sum_k |h[k]| = \infty$, so the output is unbounded. $\;\blacksquare$

How to Use: Hand-Computing Convolution

Compute the convolution of $x[n] = \{1, 2, 3\}$ ($n=0,1,2$) and $h[n] = \{1, 1, 1\}/3$ (3-point moving average)

Flip $h[k]$: $h[-k] = \{1, 1, 1\}/3$ (symmetric, so flipping has no effect)
Slide $h[n-k]$ and compute the product-sum for each $n$:

n=0: y[0] = 1·(1/3) = 1/3 n=1: y[1] = 1·(1/3) + 2·(1/3) = 1.0 n=2: y[2] = 1·(1/3) + 2·(1/3) + 3·(1/3) = 2.0 n=3: y[3] = 2·(1/3) + 3·(1/3) = 5/3 n=4: y[4] = 3·(1/3) = 1.0

Result: $y[n] = \{1/3, 1, 2, 5/3, 1\}$, length = $3 + 3 - 1 = 5$.

Applications

FIR Digital Filters: Every FIR filter is essentially "convolving the input with the filter coefficients." The filter coefficients are the impulse response $h[n]$.
Audio Reverb: Record a room's response $h[n]$ to a hand clap (approximating $\delta[n]$), then convolve it with any dry signal to simulate playing in that room.
Communication Channel Modeling: A wireless channel can be modeled by a multipath impulse response $h[n]$; the received signal = transmitted signal $*$ channel impulse response + noise.

Pitfalls and Limitations

Convolution applies only to LTI systems: If the system is nonlinear (e.g., a compressor) or time-varying (e.g., LFO modulation), convolution does not apply.
Direct convolution costs $O(NM)$: For long sequences, always use FFT-based fast convolution ($O(N\log N)$).
Linear convolution vs. circular convolution: The DFT computes circular convolution! To perform linear convolution via the DFT, you must zero-pad. This is the most common mistake beginners make.

Quick Check

Q1: Is the system $y[n] = x[n] \cdot x[n-1]$ LTI? Why or why not?

Show answer

No. It is time-invariant (delayed input → delayed output), but nonlinear. Verification: let $x_1[n]=1, x_2[n]=1$; then $\mathcal{T}\{x_1+x_2\} = 2\cdot 2 = 4$, but $\mathcal{T}\{x_1\}+\mathcal{T}\{x_2\} = 1+1 = 2 \neq 4$, violating superposition.

Q2: If $h[n] = (0.5)^n u[n]$, is the system BIBO stable?

Show answer

$\sum_{n=0}^{\infty}|h[n]| = \sum_{n=0}^{\infty}(0.5)^n = \frac{1}{1-0.5} = 2 < \infty$. The impulse response is absolutely summable, so the system is BIBO stable.

Interactive: Convolution Sliding Animation

Select the input $x[n]$ and impulse response $h[n]$, then drag the slider to watch $h[n-k]$ slide across $x[k]$. The product area (green) is the value of $y[n]$ at the current $n$.

Input x[n]

Filter h[n]

n = 0

← 2b.1 Discrete-Time Signals 2b.3 Difference Equations & System Functions →

2b.3 Difference Equations & System Function $H(z)$

From time-domain recursion to Z-domain algebra — a unified view of FIR and IIR

One-Sentence Summary: The Linear Constant-Coefficient Difference Equation (LCCDE) is the time-domain language for describing LTI systems; applying the M2B.4 Z-Transform converts it into the algebraic expression $H(z) = B(z)/A(z)$, where system stability and frequency response are entirely encoded in the poles and zeros.

Learning Objectives

Write out the general LCCDE and understand the roles of $a_k$ (feedback coefficients) and $b_k$ (feedforward coefficients)
Apply the Z-transform to the LCCDE to derive the system function $H(z) = B(z)/A(z)$
Distinguish FIR (all $a_k=0$, zeros only) from IIR (has poles, requires stability analysis)
Read stability from the pole-zero plot (causal system: all poles inside the unit circle)
Compute the frequency response $H(e^{j\omega})$ from $H(z)$

Why Learn This: The difference equation is the "blueprint" for DSP hardware and software implementations — each $b_k$ is a multiply-add operation, and each $a_k$ is a feedback loop. The system function $H(z)$ lets you see at a glance whether a system is stable (poles inside the unit circle) or about to blow up (poles outside the circle). Filter design tools (MATLAB fdatool, Python scipy.signal) all operate on the $b_k, a_k$ coefficients.

Previously: The previous section established the convolution relation $y[n] = x[n] * h[n]$ for LTI systems. But convolution is an infinite summation — how does a real system implement it with finite memory? The answer: describe the behavior of $h[n]$ with a difference equation, using feedback to replace an infinitely long impulse response. The Z-transform then converts this recursive relation into an algebraically manipulable polynomial ratio.

Pain Point: Convolution Is Too Slow, Impulse Response Is Too Long

A first-order IIR low-pass filter with $h[n] = a^n u[n]$ has an infinitely long impulse response; direct convolution requires infinite computation
But the difference equation $y[n] = x[n] + a\,y[n-1]$ needs only 1 multiplication + 1 addition per sample via recursion
The catch: feedback introduces a stability risk — if $|a| \geq 1$, the output will blow up
We need a tool to quickly assess stability → pole-zero analysis of $H(z)$

Historical Context: The history of difference equations dates back to de Moivre (1718) solving linear recurrence relations. The Z-transform was introduced to control theory by Witold Hurewicz in 1947, and Ragazzini and Zadeh (1952) systematically applied it to sampled-data systems. After digital filter theory matured in the 1960s, pole-zero analysis of $H(z)$ became an everyday tool for DSP engineers. E. Christian and E. Eisenmann were among the first to convert analog circuit filters into digital difference equation implementations.

Core Concepts: The General LCCDE

General Form

$$y[n] + \sum_{k=1}^{N} a_k\,y[n-k] = \sum_{k=0}^{M} b_k\,x[n-k]$$

$b_k$: Feedforward coefficients | $a_k$: Feedback coefficients | Order $= \max(N, M)$

Intuition: The left side contains $y[n-k]$ (past outputs) → this is "feedback" → creating a recursive structure. The right side contains only $x[n-k]$ (past inputs) → this is "feedforward" → a non-recursive structure.

Deriving $H(z)$ via the Z-Transform

Using the time-shift property of the Z-transform: $\mathcal{Z}\{x[n-k]\} = z^{-k}X(z)$.

Expand: Derivation of $H(z)$

Apply the Z-transform to both sides of the LCCDE:

$$Y(z) + \sum_{k=1}^{N} a_k\,z^{-k}\,Y(z) = \sum_{k=0}^{M} b_k\,z^{-k}\,X(z)$$

Factor out $Y(z)$ and $X(z)$:

$$Y(z)\left(1 + \sum_{k=1}^{N} a_k\,z^{-k}\right) = X(z)\left(\sum_{k=0}^{M} b_k\,z^{-k}\right)$$

Define the system function:

$$H(z) = \frac{Y(z)}{X(z)} = \frac{\sum_{k=0}^{M} b_k\,z^{-k}}{1 + \sum_{k=1}^{N} a_k\,z^{-k}} = \frac{B(z)}{A(z)}$$

$\;\blacksquare$

System Function (Transfer Function)

$$H(z) = \frac{B(z)}{A(z)} = \frac{b_0 + b_1 z^{-1} + b_2 z^{-2} + \cdots + b_M z^{-M}}{1 + a_1 z^{-1} + a_2 z^{-2} + \cdots + a_N z^{-N}}$$

Factoring the numerator and denominator:

$$H(z) = b_0 \cdot \frac{\prod_{i=1}^{M}(1 - q_i z^{-1})}{\prod_{i=1}^{N}(1 - p_i z^{-1})}$$

$q_i$: Zeros, $H(q_i)=0$ | $p_i$: Poles, $H(p_i) \to \infty$

FIR vs. IIR: A Comparison

Characteristic	FIR (Finite Impulse Response)	IIR (Infinite Impulse Response)
Difference Equation	$y[n] = \sum_{k=0}^{M} b_k\,x[n-k]$	$y[n] + \sum a_k\,y[n-k] = \sum b_k\,x[n-k]$
$H(z)$	Polynomial ($B(z)$ only)	Rational function $B(z)/A(z)$
Poles	Only at $z=0$ (always stable)	At $z=p_i$; requires $\|p_i\|<1$
Stability	Unconditionally stable	Depends on pole locations
Impulse Response Length	Finite ($M+1$ points)	Theoretically infinite
Computation	Requires more coefficients for steep roll-off	Fewer coefficients achieve narrow-band filtering

Stability Criterion: Poles and the Unit Circle

A causal LTI system is BIBO stable $\iff$ all poles of $H(z)$ satisfy $|p_i| < 1$

That is, all poles lie strictly inside the unit circle in the Z-plane.

Intuition: The natural mode associated with pole $p_i$ is $p_i^n$. If $|p_i|<1$, then $p_i^n \to 0$ (decaying); if $|p_i|>1$, then $p_i^n \to \infty$ (blowing up); if $|p_i|=1$, oscillation persists without decay (marginally stable).

Frequency Response: $H(z)$ Evaluated on the Unit Circle

$$H(e^{j\omega}) = H(z)\Big|_{z=e^{j\omega}} = \frac{\sum_{k=0}^{M} b_k\,e^{-j\omega k}}{1 + \sum_{k=1}^{N} a_k\,e^{-j\omega k}}$$

$|H(e^{j\omega})|$: Magnitude Response | $\angle H(e^{j\omega})$: Phase Response

Geometric relationship between poles/zeros and frequency response: At frequency $\omega$, the magnitude $|H(e^{j\omega})|$ is proportional to "the product of distances from $e^{j\omega}$ to each zero / the product of distances from $e^{j\omega}$ to each pole." Near zeros the magnitude is small (notches); near poles the magnitude is large (peaks).

How to Use: Complete First-Order IIR Example

Problem: Analyze the system $y[n] = x[n] + 0.8\,y[n-1]$.

Identify coefficients: $b_0 = 1$, $a_1 = -0.8$ (note the sign convention $y[n] + a_1 y[n-1]$ in the difference equation)
System function:$$H(z) = \frac{1}{1 - 0.8\,z^{-1}} = \frac{z}{z - 0.8}$$
Poles and zeros: Zero at $z=0$, pole at $z=0.8$ (inside the unit circle → stable)
Impulse response:$$h[n] = (0.8)^n\,u[n]$$ (exponentially decaying, infinitely long → IIR)
Frequency response:$$|H(e^{j\omega})| = \frac{1}{|1 - 0.8e^{-j\omega}|}$$ At $\omega=0$: $|H|=1/0.2=5$ (high gain at low frequencies); at $\omega=\pi$: $|H|=1/1.8\approx 0.56$ (attenuation at high frequencies) → low-pass filter

Applications

Audio Equalizer: Constructed by cascading several second-order IIR filters (biquads), each with 2 poles + 2 zeros, corresponding to the difference equation coefficients $b_0, b_1, b_2, a_1, a_2$.
PID Control Systems: A digital PID controller can be expressed as a difference equation. Z-transform analysis lets you determine closed-loop stability directly from the pole-zero plot.
Communication Channel Equalizer: The receiver designs an IIR equalizer $H_{eq}(z) \approx 1/H_{ch}(z)$ to remove channel frequency distortion. One must ensure the zeros of $H_{ch}(z)$ are not outside the unit circle (otherwise the equalizer's poles move outside → unstable).

Pitfalls and Limitations

Quantization Effects: In fixed-point IIR implementations, coefficient quantization can shift poles outside the unit circle, turning a previously stable filter unstable. Lower-order IIR filters are safer than higher-order ones.
Poles on the unit circle $\neq$ stable: $|p_i|=1$ represents marginal instability (an oscillator). BIBO stability requires the magnitude to be strictly less than 1.
Non-Minimum Phase Zeros: If zeros lie outside the unit circle, the causal inverse (equalizer) is unstable. Special treatment is needed (e.g., allpass decomposition).

Quick Check

Q1: Why is an FIR filter "unconditionally stable"? Explain from the $H(z)$ perspective.

Show answer

The denominator of an FIR $H(z)$ is 1 (no feedback $a_k$), so $H(z)=B(z)$ is a polynomial. The only poles are at $z=0$ ($N$-fold pole), and $|0|<1$ is always inside the unit circle. Therefore, regardless of the values of $b_k$, the system is always BIBO stable.

Q2: If $H(z) = \frac{1-z^{-1}}{1-0.95z^{-1}}$, where are the zeros and poles? Is the system low-pass or high-pass?

Show answer

Zero: $1-z^{-1}=0 \Rightarrow z=1$ (on the unit circle, at $\omega=0$). Pole: $1-0.95z^{-1}=0 \Rightarrow z=0.95$ (inside the unit circle → stable). At $\omega=0$: $|H(e^{j0})|=|1-1|/|1-0.95|=0$ (zero gain). At $\omega=\pi$: $|H(e^{j\pi})|=|1+1|/|1+0.95|=2/1.95\approx 1.03$. The gain is zero at DC → this is a high-pass filter.

Interactive: Pole-Zero Plot, Frequency Response, and Impulse Response

Adjust the $b_k$ and $a_k$ coefficients and observe in real time the pole-zero locations on the Z-plane, the frequency response $|H(e^{j\omega})|$, and the impulse response $h[n]$.

$b_0$

$b_1$

$b_2$

$a_1$

$a_2$

References: [1] Oppenheim & Schafer, Discrete-Time Signal Processing, 3rd ed., Ch.3, 5, 6. [2] Mitra, Digital Signal Processing: A Computer-Based Approach, 4th ed., Ch.4. [3] Proakis & Manolakis, Digital Signal Processing, 4th ed., Ch.3.

← 2b.2 LTI Systems & Convolution 2b.4 Z-Transform →

4B.1 IIR Filter Design

From analog prototypes to digital implementation — a complete comparison of four classic IIR designs

Learning Objectives

Understand the advantages and trade-offs of IIR filters relative to M4A FIR Design
Compare the frequency-response characteristics of Butterworth / Chebyshev I / Chebyshev II / Elliptic designs
Master the derivation of the Bilinear Transform and frequency pre-warping
Complete the full design flow from specifications to $H(z)$

One-Sentence Summary

IIR filters use a recursive (feedback) structure to approximate ideal frequency responses with very low order; the four classic designs represent different trade-offs between passband flatness and transition-band steepness.

Why Learn This?

In the era of analog circuits, all filters were inherently IIR — RLC networks composed of capacitors, inductors, and resistors are recursive systems. Stephen Butterworth (1930) proposed the "maximally flat" criterion in an unassuming British radio engineering paper, laying the foundation for Butterworth filters.

Pafnuty Chebyshev (19th-century Russian mathematician) developed an equiripple approximation theory that was applied to filter design half a century later, giving rise to Chebyshev Type I and Type II filters. Wilhelm Cauer (1931) used elliptic function theory to derive the most efficient elliptic filters.

With the advent of the digital era, these analog prototypes were "ported" to the $z$-domain via the Bilinear Transform — allowing us to inherit decades of analog design wisdom while enjoying the precision and flexibility of digital implementation.

Previously...

In FIR design (Module 4A), we learned the advantages of FIR filters: linear phase, unconditional stability, and intuitive design. However, FIR has a fundamental limitation —

To achieve a steep transition band, FIR requires very high order (hundreds or even thousands)
High order = high computation = high latency (group delay $\approx (N-1)/2$ samples)
In real-time systems (audio, control), such latency is unacceptable

IIR filters use feedback (recursion) to "remember" past outputs, achieving the same or even better frequency selectivity with far fewer coefficients. The price: nonlinear phase, potential instability, and more mathematical complexity in the design process.

Pain Point: FIR Is Too "Expensive"

Suppose you need a low-pass filter: passband up to 1 kHz, stopband starting at 1.2 kHz, stopband attenuation 60 dB, sampling rate 8 kHz.

FIR approach: Using the Kaiser window method, the estimated order is $N \approx \frac{A_s - 7.95}{2.285 \cdot \Delta\omega} \approx \frac{60 - 7.95}{2.285 \times 0.05\pi} \approx 145$. Each output sample requires 145 multiply-accumulate operations.
IIR approach: A 4th-order Elliptic filter meets the same specification. Each output sample requires only 8 multiply-accumulates — a 18x reduction in computation.

In embedded systems, real-time audio processing, and high-speed communications, this difference is decisive.

Origin: From the $s$-Domain to the $z$-Domain

The design approach for IIR digital filters is not to design $H(z)$ directly, but rather:

Use well-established analog filter design theory to obtain an analog prototype $H_a(s)$
Apply an $s \to z$ mapping to "translate" the analog filter into a digital filter $H(z)$

Why not design $H(z)$ directly? Because analog filter design has a century of accumulated knowledge — formula tables, charts, closed-form solutions — borrowing from this body of work is far more economical than deriving everything from scratch.

Core Concepts: The Four Classic IIR Filters

Intuition: The four designs differ in how they allocate "approximation error" — you can make the passband as flat as possible (Butterworth), distribute the passband error uniformly to gain a steeper transition band (Chebyshev I), place the error in the stopband (Chebyshev II), or allow equiripple on both sides for the steepest possible transition band (Elliptic).

1. Butterworth (Maximally Flat)

$$|H_a(j\Omega)|^2 = \frac{1}{1 + \left(\Omega/\Omega_c\right)^{2N}}$$

As flat as possible in the passband (the first $2N-1$ derivatives are zero at $\Omega=0$), but the transition-band roll-off is the slowest.

2. Chebyshev Type I (Passband Equiripple)

$$|H_a(j\Omega)|^2 = \frac{1}{1 + \varepsilon^2 \, T_N^2(\Omega/\Omega_c)}$$

$T_N$ is the $N$th-order Chebyshev polynomial; $\varepsilon$ controls the passband ripple magnitude. The passband has ripple, but the roll-off is steeper than Butterworth.

3. Chebyshev Type II (Stopband Equiripple)

$$|H_a(j\Omega)|^2 = \frac{1}{1 + \left[\varepsilon^2 \, T_N^2(\Omega_c/\Omega)\right]^{-1}}$$

The passband is flat; ripple appears in the stopband. Stopband zeros provide better stopband attenuation.

4. Elliptic / Cauer (Equiripple on Both Sides)

$$|H_a(j\Omega)|^2 = \frac{1}{1 + \varepsilon^2 \, R_N^2(\Omega/\Omega_c)}$$

$R_N$ is a rational Chebyshev function (ratio of Jacobi elliptic functions). Both the passband and stopband have equiripple, but the transition band is the steepest for a given order — this is the theoretically optimal solution.

Comparison of the Four Designs

Characteristic	Butterworth	Chebyshev I	Chebyshev II	Elliptic
Passband	Maximally flat	Equiripple	Flat	Equiripple
Stopband	Monotonic decay	Monotonic decay	Equiripple	Equiripple
Transition Steepness	Gentlest (needs high order)	Moderate	Moderate	Steepest (fewest order)
Phase Linearity	Best	Worse	Worse	Worst
Design Parameters	$N, \Omega_c$	$N, \Omega_c, \varepsilon$	$N, \Omega_s, \varepsilon$	$N, \Omega_c, \varepsilon_p, \varepsilon_s$
Typical Use	Anti-aliasing, general	Frequency selection	Flat passband needed	Stringent specs

Expand: Derivation of Butterworth pole locations

The poles of a Butterworth filter lie on a left-half semicircle in the $s$-plane. Starting from $|H_a(j\Omega)|^2 = 1/(1+(\Omega/\Omega_c)^{2N})$:

Let $s = j\Omega$, then $H_a(s)H_a(-s) = 1/(1+(-s^2/\Omega_c^2)^N)$.

Poles occur where $(-s^2/\Omega_c^2)^N = -1 = e^{j(2k+1)\pi}$, giving:

$$s_k = \Omega_c \, e^{j\pi(2k+N+1)/(2N)}, \quad k = 0, 1, \ldots, 2N-1$$

There are $2N$ poles uniformly distributed on a circle of radius $\Omega_c$. Selecting the $N$ poles in the left half-plane gives the stable $H_a(s)$:

$$H_a(s) = \frac{\Omega_c^N}{\prod_{k=0}^{N-1}(s - s_k)}, \quad \text{Re}(s_k) < 0$$

For example, when $N=2$, the poles are at $\pm 135°$ and $\pm 225°$ (select the left-half-plane poles at $135°$ and $225°$):

$$s_{0,1} = \Omega_c\,e^{j3\pi/4},\; \Omega_c\,e^{j5\pi/4} = \Omega_c\left(-\frac{1}{\sqrt{2}} \pm j\frac{1}{\sqrt{2}}\right)$$ $$H_a(s) = \frac{\Omega_c^2}{s^2 + \sqrt{2}\,\Omega_c\,s + \Omega_c^2} \quad\blacksquare$$

Bilinear Transform

Intuition: We need to map the $s$-plane to the $z$-plane while ensuring that analog stability (left half-plane) maps to digital stability (inside the unit circle). The bilinear transform is exactly such a perfect mapping.

Bilinear Transform Formula

$$s = \frac{2}{T}\,\frac{z-1}{z+1} \quad \Longleftrightarrow \quad z = \frac{1 + (T/2)s}{1 - (T/2)s}$$

Frequency Mapping (Frequency Warping):

Let $s = j\Omega$, $z = e^{j\omega}$, and substitute to get:

$$\Omega = \frac{2}{T}\tan\!\left(\frac{\omega}{2}\right)$$

The analog frequency $\Omega \in [0, \infty)$ is compressed into the digital frequency $\omega \in [0, \pi)$. At low frequencies the mapping is approximately linear ($\Omega \approx \omega/T$); at high frequencies severe warping occurs.

Frequency Pre-warping: During design, first convert the desired digital cutoff frequency $\omega_c$ back to the analog frequency $\Omega_c = (2/T)\tan(\omega_c/2)$. Use $\Omega_c$ to design the analog prototype so that, after the transform, the digital filter's cutoff falls precisely at $\omega_c$.

Expand: Derivation of the bilinear transform (trapezoidal integration)

The time-domain representation of an analog system $H_a(s) = Y(s)/X(s)$ is a differential equation. The simplest digitization approach is to approximate derivatives with numerical integration.

The Trapezoidal Rule approximates integration as:

$$y[n] = y[n-1] + \frac{T}{2}\big(x[n] + x[n-1]\big)$$

Taking the $z$-transform: $Y(z) = z^{-1}Y(z) + \frac{T}{2}(1+z^{-1})X(z)$

$$\frac{Y(z)}{X(z)} = \frac{T/2 \cdot (1+z^{-1})}{1 - z^{-1}} = \frac{T}{2}\,\frac{z+1}{z-1}$$

This is the digital approximation of $1/s$, hence $s$ corresponds to $\frac{2}{T}\frac{z-1}{z+1}$.

Proof that stability is preserved: Let $z = re^{j\theta}$, substitute into $s = \frac{2}{T}\frac{re^{j\theta}-1}{re^{j\theta}+1}$, and verify:

$|z| < 1$ (inside unit circle) $\Rightarrow$ $\text{Re}(s) < 0$ (left half-plane)
$|z| = 1$ (on unit circle) $\Rightarrow$ $\text{Re}(s) = 0$ (imaginary axis)
$|z| > 1$ (outside unit circle) $\Rightarrow$ $\text{Re}(s) > 0$ (right half-plane)

Therefore, a stable analog system is guaranteed to remain stable after the bilinear transform. $\;\blacksquare$

How to Use: Complete Design Example

Goal: Design a 4th-order Butterworth low-pass filter with $f_c = 1\,\text{kHz}$ and $f_s = 8\,\text{kHz}$.

Step 1: Compute the digital cutoff frequency

$\omega_c = 2\pi f_c / f_s = 2\pi \times 1000 / 8000 = \pi/4 \;\text{rad}$

Step 2: Pre-warp to the analog frequency

Set $T = 1$ (for simplicity): $\Omega_c = \frac{2}{T}\tan\!\left(\frac{\omega_c}{2}\right) = 2\tan\!\left(\frac{\pi}{8}\right) \approx 2 \times 0.4142 = 0.8284$

Step 3: Design the analog Butterworth prototype

The 4th-order Butterworth poles lie on a circle of radius $|\Omega_c|$, at angles $\theta_k = \pi(2k+5)/8$, $k=0,1,2,3$:

$s_{0,3} = 0.8284\,e^{j5\pi/8},\; 0.8284\,e^{j7\pi/8}$ and their conjugates

Split into two second-order sections:

$H_a(s) = \frac{\Omega_c^2}{s^2 + 2\cos(\pi/8)\,\Omega_c\,s + \Omega_c^2} \cdot \frac{\Omega_c^2}{s^2 + 2\cos(3\pi/8)\,\Omega_c\,s + \Omega_c^2}$

Step 4: Apply the bilinear transform to each section

Substitute $s = 2(z-1)/(z+1)$ and simplify to get two digital biquad sections $H_1(z)$ and $H_2(z)$.

Step 5: Cascade to obtain the final filter

$H(z) = H_1(z) \cdot H_2(z)$, implemented in SOS (Second-Order Sections) form.

# Python: Design 4th-order Butterworth low-pass filter import scipy.signal as sig import numpy as np fs = 8000 # Sample rate 8kHz fc = 1000 # Cutoff 1kHz order = 4 # Design (auto pre-warping + bilinear transform) b, a = sig.butter(order, fc/(fs/2), btype='low') # Apply to signal y = sig.lfilter(b, a, x) # Frequency response w, h = sig.freqz(b, a, fs=fs)

Applications

Application	Recommended Type	Rationale
Anti-aliasing Filter	Butterworth	Flat passband avoids signal distortion
Audio Equalizer	Butterworth / Chebyshev II	Passband flatness is critical
Communication Channel Selection	Elliptic	Steeper transition band is better; phase distortion can be compensated with an equalizer
Biomedical Signals (ECG Filtering)	Butterworth	No passband ripple allowed; better phase characteristics needed
Radar/Sonar Receiver	Chebyshev I / Elliptic	Strict frequency selectivity requirements

Pitfalls and Limitations

1. Nonlinear Phase: The phase response of an IIR filter is not linear, meaning different frequency components experience different delays after passing through the filter. In applications requiring waveform fidelity (e.g., ECG, seismic wave analysis), this can cause waveform distortion. Solution: use zero-phase filtering (forward-backward filtering, i.e., MATLAB's filtfilt), though this works only offline.

2. Stability Risk: IIR poles must lie inside the unit circle. In high-order direct-form implementations, coefficient quantization can push poles outside the circle, causing instability. Solution: use cascade second-order sections (SOS) implementation.

3. High-Frequency Warping: The bilinear transform has severe frequency compression at high frequencies. Specifications near the Nyquist frequency (e.g., stopband edge above $0.45\pi$) will be inaccurate due to warping. Always use pre-warping.

Rule of Thumb: If your application does not require linear phase and the transition band is narrow, try IIR first (Butterworth is usually sufficient). If linear phase is needed, use FIR.

Quick Check

Q1: For the same filter order $N$, which design produces the steepest transition band? Why?

Answer

The Elliptic filter. Because it allows equiripple in both the passband and the stopband, distributing the approximation error "evenly" across both bands. According to Chebyshev approximation theory, this is the optimal strategy for minimizing the maximum deviation at a given order. Butterworth concentrates all approximation precision near $\Omega=0$ (maximally flat), so the approximation deteriorates away from $\Omega_c$ and requires a higher order to achieve the same stopband attenuation.

Q2: What happens if you forget to apply frequency pre-warping in the bilinear transform?

Answer

Because the bilinear transform's frequency mapping $\Omega = (2/T)\tan(\omega/2)$ is nonlinear, omitting pre-warping causes the actual cutoff frequency of the digital filter to be lower than intended (since $\tan(\omega/2) < \omega/2$ for $\omega > 0$, i.e., the analog frequency is compressed). The higher the frequency, the greater the deviation; near Nyquist, the distortion is extreme. Pre-warping reverses this by first converting the desired digital cutoff frequency to the corresponding analog frequency, so the cutoff falls precisely at the correct position after the transform.

Interactive: Comparing the Four IIR Filter Designs

Select a filter type and parameters to compare the magnitude responses of the four classic designs in real time.

Highlight type: Order N: 4 Cutoff frequency: 0.25π Passband ripple (dB): 1.0

Observe: Butterworth is smooth but rolls off slowly; Elliptic is the steepest but has ripple. Adjust the order to see convergence speed differences.

All-Pass Filters and Minimum-Phase Decomposition

Any stable causal system can be decomposed as "minimum-phase $\times$ all-pass" — this is a key theoretical result for IIR design and phase equalization.

All-Pass Filter

Definition: $|H_{ap}(e^{j\omega})| = 1$ for all $\omega$ (the magnitude is the same at every frequency)

Typical form:

$$H_{ap}(z) = \frac{z^{-1} - a^*}{1 - az^{-1}}, \quad |a| < 1$$

Each pole $a$ is paired with a mirror zero $1/a^*$ (outside the unit circle). The combination of pole plus mirror zero makes the magnitude response identically equal to 1, while the phase response is non-zero — this is why all-pass filters adjust phase without changing magnitude.

Minimum-Phase System

Definition: all zeros and poles lie inside the unit circle ($|z|<1$)

Key properties:

Given a magnitude response $|H(e^{j\omega})|$, the minimum-phase realization has the smallest group delay among all causal realizations
It is causally invertible ($1/H_{min}(z)$ is also stable and causal)
Energy is concentrated near the beginning of the signal (peak arrives earliest)

Minimum-Phase + All-Pass Decomposition Theorem

Any stable causal system $H(z)$ can be decomposed as:

$$H(z) = H_{min}(z) \cdot H_{ap}(z)$$

where $H_{min}$ is minimum-phase and $H_{ap}$ is all-pass.

Derivation: Why does this decomposition work?

For every zero $z_0$ of $H(z)$ that lies outside the unit circle ($|z_0|>1$), "reflect" it to $1/z_0^*$ inside the unit circle, and add an all-pass factor to compensate for the difference:

$$\frac{1 - z_0 z^{-1}}{1} = \underbrace{\frac{1 - z_0^{-1*}z^{-1}}{1}}_{\text{minimum phase}} \cdot \underbrace{\frac{z^{-1} - z_0^{*}}{1 - z_0^{-1*}z^{-1}}}_{\text{all-pass}}$$

Expanding verifies: the product of the two factors on the right equals the original left-hand side. Thus the original "outside zero" becomes "an inside zero plus an all-pass rotation." $\blacksquare$

Applications:

Filter inversion: to compute $H^{-1}(z)$, one must first separate the non-minimum-phase part
Phase equalization: cascading an all-pass filter after the original filter changes the phase response without affecting the magnitude
Spectral shaping: all systems with the same magnitude but different phases share the same $|H|$; they differ only in the all-pass component
System identification: given $|H|$, the minimum-phase realization is the "simplest" causal implementation

References

[1] A. V. Oppenheim & R. W. Schafer, Discrete-Time Signal Processing, 3rd ed., Ch. 7.
[2] S. K. Mitra, Digital Signal Processing: A Computer-Based Approach, Ch. 8.
[3] L. B. Jackson, Digital Filters and Signal Processing, Ch. 6.
[4] S. Butterworth, "On the Theory of Filter Amplifiers," Wireless Engineer, 1930.

← Prev: FIR Filter Design Next: Filter Realization Structures →

4C.1 Filter Realization Structures

Same transfer function, different structures = different numerical fates

Learning Objectives

Understand that a single $H(z)$ can be implemented with multiple equivalent structures
Compare Direct Form I / II, Cascade (SOS), Parallel, and Lattice structures
Understand why Cascade/SOS is the industry standard under finite-precision arithmetic
Gain intuition through float32 vs. float64 experiments on how structure affects numerical stability

One-Sentence Summary

The same $H(z)$ implemented in different structures is mathematically identical under infinite precision — but under finite precision (fixed-point/floating-point), the choice of structure determines whether the filter works accurately or breaks down entirely.

Why Learn This?

In the late 1960s, digital filters began to be used in military radar and space missions. Engineers implemented high-order IIR filters using the theoretically correct Direct Form, only to find that the filters completely failed in hardware — outputs exploded, self-oscillation occurred, and frequency responses were unrecognizable.

Research by James Kaiser and Clifford Weinstein (MIT Lincoln Lab, ~1969) revealed the cause: finite word-length effects. When an IIR filter's poles are very close to the unit circle (common in narrowband filters), Direct Form coefficients are extremely sensitive to quantization — tiny rounding errors can push poles outside the circle.

The solution was to decompose high-order filters into a cascade of Second-Order Sections (SOS) — each section has only two poles, dramatically reducing coefficient sensitivity. This lesson remains a core DSP engineering principle to this day.

Previously...

In Module 4B, we learned how to design the transfer function $H(z) = B(z)/A(z)$ of an IIR filter. But a transfer function only describes the input-output relationship — it does not specify how the internals are wired. The same $H(z)$ can be realized with entirely different arrangements of delays, adders, and multipliers; each arrangement is a "structure."

Pain Point: "Theoretically Correct" Does Not Mean "Practically Usable"

Consider an 8th-order narrowband bandpass IIR filter (center frequency $\omega_0 = 0.1\pi$, bandwidth $0.01\pi$):

Implemented in Direct Form II: the denominator polynomial $A(z) = 1 + a_1 z^{-1} + \cdots + a_8 z^{-8}$, where some $a_k$ have absolute values in the hundreds
Under 32-bit floating point, rounding of $a_k$ shifts pole positions — a pole's radius changes from 0.998 to 1.003
Result: the system is unstable. Output grows exponentially until overflow
Switch to Cascade/SOS (4 second-order sections in series): each section's coefficient magnitudes are $\leq 2$, stable even with 16-bit fixed point

Origin: Why Are There So Many Structures?

Mathematically, an $N$th-order IIR transfer function:

$$H(z) = \frac{b_0 + b_1 z^{-1} + \cdots + b_M z^{-M}}{1 + a_1 z^{-1} + \cdots + a_N z^{-N}}$$

can be computed using any equivalent set of difference equations. Different organizations correspond to different Signal Flow Graphs (SFGs), and each is a "structure." They are fully equivalent under infinite precision, but their behavior diverges drastically under finite precision — this is why studying structures matters.

Core Concepts: Five Major Structures

1. Direct Form I

The most intuitive: compute the numerator (FIR part) first, then the denominator (feedback).

$$y[n] = \sum_{k=0}^{M} b_k\,x[n-k] - \sum_{k=1}^{N} a_k\,y[n-k]$$

Requires $M+N$ delay elements and $M+N+1$ multiplications. Two independent delay lines.

2. Direct Form II (Canonical Form)

Swap the order of numerator and denominator (allowed for LTI systems) and merge the delay lines.

$$w[n] = x[n] - \sum_{k=1}^{N} a_k\,w[n-k]$$ $$y[n] = \sum_{k=0}^{M} b_k\,w[n-k]$$

Requires only $\max(M,N)$ delay elements (minimum delays = "canonical"), but the dynamic range of internal node $w[n]$ can be very large.

3. Cascade / SOS (Cascaded Second-Order Sections)

$$H(z) = K \prod_{i=1}^{L} \frac{b_{0i} + b_{1i}z^{-1} + b_{2i}z^{-2}}{1 + a_{1i}z^{-1} + a_{2i}z^{-2}}$$

Decompose $H(z)$ into a product of $L = \lceil N/2 \rceil$ biquad sections. Each section handles only its own pair of conjugate poles and pair of zeros; the coefficient range is small and insensitive to quantization.

Industry Standard: Virtually all DSP chip filter libraries use SOS form. MATLAB's sosfilt and Python scipy's sosfilt are both SOS implementations.

4. Parallel Form

$$H(z) = K + \sum_{i=1}^{L} \frac{c_{0i} + c_{1i}z^{-1}}{1 + a_{1i}z^{-1} + a_{2i}z^{-2}}$$

Second-order sections in parallel, obtained via partial fraction expansion. Each section operates independently and can be parallelized.

5. Lattice Structure

Parameterized by reflection coefficients $\kappa_i$. For all-pole filters, $|\kappa_i| < 1$ guarantees stability — this is a structural guarantee independent of precision.

$$f_m[n] = f_{m-1}[n] + \kappa_m\,g_{m-1}[n-1]$$ $$g_m[n] = \kappa_m\,f_{m-1}[n] + g_{m-1}[n-1]$$

Commonly used in speech coding (LPC) and adaptive filters.

Structure Comparison Table

Structure	Delays	Multiplications	Quantization Sensitivity	Overflow Risk	Notes
Direct Form I	$M+N$	$M+N+1$	High	Medium	Most intuitive
Direct Form II	$\max(M,N)$	$M+N+1$	High	High (internal nodes)	Fewest delays
Cascade/SOS	$2L$	$5L+1$	Low	Low	Industry preferred
Parallel	$2L$	$5L+1$	Low	Low	Parallelizable
Lattice	$N$	$2N$	Lowest	Lowest	Stability guaranteed (all-pole)

Expand: Why does SOS have low coefficient sensitivity?

Consider the denominator polynomial of an $N$th-order filter $A(z) = \prod_{k=1}^{N}(1-p_k z^{-1})$. In direct form, the coefficients $a_k$ are elementary symmetric functions of the poles.

Pole sensitivity with respect to coefficients:

$$\frac{\partial p_i}{\partial a_k} = \frac{-p_i^{N-k}}{\prod_{j \neq i}(p_i - p_j)}$$

When poles cluster together (narrowband filters), the denominator $\prod_{j \neq i}(p_i - p_j)$ approaches zero, and the sensitivity approaches infinity.

In SOS, each section has only 2 poles $p_i, p_i^*$, and the sensitivity is:

$$\frac{\partial p_i}{\partial a_{1i}} = \frac{-p_i}{p_i - p_i^*} = \frac{-p_i}{2j\,\text{Im}(p_i)}$$

This value is bounded (as long as the poles are not on the real axis) and does not worsen as the filter order increases. $\;\blacksquare$

How to Use: Structure Selection Guide

FIR filters: Direct Form (or Transposed) is usually sufficient, since there is no feedback and no stability issues.
Low-order IIR ($\leq 2$nd order): Direct Form II is fine. A single biquad is one SOS section.
High-order IIR ($\geq 4$th order): Always use Cascade/SOS. Do not use Direct Form.
Parallel computation needed (FPGA/GPU): Consider Parallel Form.
Speech/adaptive (all-pole): Lattice structure, leveraging the stability guarantee of reflection coefficients.

Golden Rule: Never implement a high-order IIR filter using Direct Form. Even MATLAB warns in its filter documentation: "For high-order filters, use sosfilt."

Applications

Embedded DSP (16/32-bit fixed-point): SOS is the only reliable way to implement IIR. Filter libraries for TI C5000/C6000 series are all SOS-based.
Audio Processing (Parametric EQ): One biquad per frequency band, cascaded together. The EQ in DAWs like ProTools and Logic Pro are all biquad cascades.
Control Systems: A PID controller can be implemented as a single biquad. Higher-order controllers use SOS.
Speech Coding (LPC): The 10th-order all-pole model uses a Lattice structure; reflection coefficients can be directly compressed and transmitted.

Pitfalls and Limitations

1. SOS section ordering: The cascade order affects the dynamic range of intermediate nodes. General rule: place the sections with poles closest to the unit circle last (so the highest-gain section is processed last), or use the zp2sos function to automatically pair poles and zeros.

2. Gain distribution: The total gain $K$ should be distributed across sections, not concentrated in one — otherwise that section may overflow.

3. Transposed Form pitfall: Direct Form II Transposed is good for FIR (low accumulated error), but the internal dynamic range problem still exists for IIR.

Quick Check

Q1: How many delay elements does a 6th-order IIR filter need in Direct Form II? In Cascade/SOS?

Answer

Direct Form II: $\max(M,N) = 6$ delay elements (canonical form). Cascade/SOS: Split into $L = 3$ second-order sections, each with 2 delays, totaling $2 \times 3 = 6$. The number of delays is the same, but SOS has far lower coefficient sensitivity than Direct Form.

Q2: Why do fixed-point DSP chips almost exclusively use SOS for IIR?

Answer

Fixed-point arithmetic has very low precision (typically 16-bit, i.e., only 15 bits for the fractional part), so coefficient quantization errors are relatively large. In Direct Form, the coefficients of a high-order polynomial are symmetric functions of all the poles; tiny quantization can drastically alter pole positions and even cause instability. In SOS, each section manages only two poles, the coefficient range is small ($|a_{1i}| \leq 2$, $|a_{2i}| \leq 1$), and the impact of quantization error on poles is confined to a controllable range.

Interactive: Direct Form vs. Cascade/SOS Precision Comparison

The same 4th-order IIR filter implemented in both Direct Form II and Cascade/SOS. Compare the frequency response differences under float32 vs. float64.

Structure:

Red dashed = float32 Direct Form (note high-frequency deviation), blue solid = float64 reference, green = float32 SOS (nearly coincides with reference).

References

[1] A. V. Oppenheim & R. W. Schafer, Discrete-Time Signal Processing, 3rd ed., Ch. 6.
[2] P. S. R. Diniz, E. A. B. da Silva, S. L. Netto, Digital Signal Processing: System Analysis and Design, Ch. 9.
[3] L. B. Jackson, "Roundoff-Noise Analysis for Fixed-Point Digital Filters Realized in Cascade or Parallel Form," IEEE Trans. Audio Electroacoustics, 1970.

← Prev: IIR Filter Design Next: Numerical Precision →

4E.1 Adaptive Filters

When the optimal filter changes over time — let the filter learn by itself

Learning Objectives

Understand the motivation for adaptive filters and the M8B Wiener Filter optimal solution
Derive the LMS (Least Mean Squares) algorithm and its convergence conditions
Compare the performance and computational complexity of LMS / NLMS / RLS
Build intuition through an interactive noise cancellation experiment

One-Sentence Summary

An adaptive filter is a "learn while doing" system — it does not need to know the signal's statistical properties in advance; instead, it continuously adjusts its coefficients based on the error signal during operation, converging toward the optimal solution.

Why Learn This?

Bernard Widrow and his doctoral student Marcian E. (Ted) Hoff (yes, the same Hoff who later co-invented the Intel 4004 microprocessor) proposed the LMS algorithm at Stanford University in 1960. It is a striking coincidence — an algorithm that transformed adaptive signal processing, co-invented by someone who would go on to transform the computer industry.

The breakthrough of LMS lies in its extreme simplicity: only three lines of computation (compute error, update weights, shift window), yet it can work in unknown and time-varying environments. Its simplicity enabled implementation on the most primitive hardware — Widrow built ADALINE (Adaptive Linear Element) using analog circuits in the 1960s.

Today, adaptive filters are everywhere: your phone cancels echo during calls (AEC), noise-canceling headphones suppress ambient noise (ANC), WiFi modems equalize channel distortion — all powered by LMS or its variants.

Previously...

So far, all the filters we have designed (FIR, IIR) are fixed — once the coefficients are designed, they never change. This is sufficient for static scenarios (fixed low-pass/high-pass requirements), but many real-world scenarios are dynamic:

Acoustic Echo Cancellation (AEC): The speaker moves around the room, and the echo path continuously changes
Channel Equalization: Wireless channel fading varies randomly over time
Active Noise Control (ANC): The noise source's statistical properties change (engine RPM changes, wind speed varies)

We need a filter that can automatically track environmental changes.

Pain Point: You Don't Know What "Optimal" Is

In Wiener filter theory, the optimal FIR filter solution is:

$$\mathbf{w}_{\text{opt}} = \mathbf{R}_{xx}^{-1}\,\mathbf{r}_{xd}$$

where $\mathbf{R}_{xx}$ is the input signal's autocorrelation matrix, and $\mathbf{r}_{xd}$ is the cross-correlation vector between the input and the desired output.

The problem is:

You usually do not know $\mathbf{R}_{xx}$ and $\mathbf{r}_{xd}$ (they are statistical quantities requiring large amounts of data to estimate)
Even if estimated, solving $\mathbf{R}_{xx}^{-1}$ is an $O(M^3)$ operation ($M$ = filter length)
The environment is changing — last second's statistics are already outdated

The adaptive filter strategy is: do not solve the equation; instead, approach the solution one step at a time — with each new sample, take a small step toward "better."

Origin: From Wiener to LMS

Evolution of ideas:

Wiener (1949): Optimal solution $\mathbf{w}_{\text{opt}} = \mathbf{R}_{xx}^{-1}\mathbf{r}_{xd}$ — perfect but impractical (requires knowing the statistics)
Steepest Descent: $\mathbf{w}[n+1] = \mathbf{w}[n] - \mu\,\nabla J[n]$, where $J = E[|e[n]|^2]$ is the mean squared error (MSE). Iteratively approaches $\mathbf{w}_{\text{opt}}$, but still requires computing the expected value of the gradient.
LMS (Widrow & Hoff, 1960): Replace the true gradient with an instantaneous gradient estimate: $\hat{\nabla}J[n] = -2\,e[n]\,\mathbf{x}[n]$. This estimate is noisy, but its expected value points in the correct direction — the ancestor of Stochastic Gradient Descent (SGD).

Core Concepts: The LMS Algorithm

Intuition: Imagine you are blindfolded, standing in a bowl-shaped valley (the MSE surface), trying to reach the bottom (the optimal solution). You cannot see the global terrain, but you can feel the slope under your feet (the instantaneous gradient). LMS takes a small step downhill at each iteration — although the direction is not perfectly accurate each time (since it is an estimate), on average you converge toward the bottom.

LMS Algorithm (Three Core Lines)

1. Compute output and error:

$$\hat{d}[n] = \mathbf{w}^T[n]\,\mathbf{x}[n] = \sum_{k=0}^{M-1}w_k[n]\,x[n-k]$$ $$e[n] = d[n] - \hat{d}[n]$$

2. Update weights:

$$\mathbf{w}[n+1] = \mathbf{w}[n] + \mu\,e[n]\,\mathbf{x}[n]$$

$d[n]$: desired signal, $\mathbf{x}[n] = [x[n], x[n-1], \ldots, x[n-M+1]]^T$: input vector, $\mu$: step size

Choosing the Step Size $\mu$

$\mu$ is the most critical hyperparameter of LMS:

Too small: Convergence is extremely slow, cannot track environmental changes
Too large: Divergence (weights explode)
Stability condition:

$$0 < \mu < \frac{2}{M \cdot \sigma_x^2} = \frac{2}{\text{tr}(\mathbf{R}_{xx})} \approx \frac{2}{\lambda_{\max}}$$

where $M$ = filter length, $\sigma_x^2$ = input power, $\lambda_{\max}$ = largest eigenvalue of $\mathbf{R}_{xx}$.

Practical Rule: Set $\mu \approx \frac{1}{10 \cdot M \cdot \hat{\sigma}_x^2}$ (1/10 of the stability bound), balancing stability and convergence speed.

NLMS: Normalized LMS

The LMS step size is sensitive to input power. NLMS normalizes by the input vector's energy at each step:

$$\mathbf{w}[n+1] = \mathbf{w}[n] + \frac{\tilde{\mu}}{\|\mathbf{x}[n]\|^2 + \delta}\,e[n]\,\mathbf{x}[n]$$

$\delta > 0$ is a small constant to prevent division by zero. $\tilde{\mu} \in (0, 2)$ no longer depends on input power.

RLS: Recursive Least Squares

Instead of gradient descent, directly solve the weighted least squares problem recursively.

$$\mathbf{w}[n] = \arg\min_{\mathbf{w}} \sum_{k=0}^{n} \lambda^{n-k}\,|d[k] - \mathbf{w}^T\mathbf{x}[k]|^2$$

$\lambda \in (0.95, 1)$ is the forgetting factor. RLS converges much faster (unaffected by eigenvalue spread), but requires $O(M^2)$ operations per step (recursive update of the inverse correlation matrix).

Comparison of the Three Algorithms

Characteristic	LMS	NLMS	RLS
Computation per step	$O(M)$	$O(M)$	$O(M^2)$
Convergence speed	Slow (depends on $\lambda_{\max}/\lambda_{\min}$)	Moderate	Fast
Tracking ability	Moderate	Moderate	Good
Stability	Requires careful $\mu$ selection	More robust	May be numerically unstable
Typical applications	AEC, channel equalization	General-purpose (preferred)	Fast convergence needs

Expand: LMS convergence analysis

Define the weight error vector $\tilde{\mathbf{w}}[n] = \mathbf{w}[n] - \mathbf{w}_{\text{opt}}$ and substitute into the LMS update equation:

$$\tilde{\mathbf{w}}[n+1] = (\mathbf{I} - \mu\,\mathbf{x}[n]\mathbf{x}^T[n])\,\tilde{\mathbf{w}}[n] + \mu\,e_o[n]\,\mathbf{x}[n]$$

where $e_o[n] = d[n] - \mathbf{w}_{\text{opt}}^T\mathbf{x}[n]$ is the optimal error (irreducible noise).

Taking expectations (using the independence assumption that $\mathbf{x}[n]$ and $\tilde{\mathbf{w}}[n]$ are independent):

$$E[\tilde{\mathbf{w}}[n+1]] = (\mathbf{I} - \mu\,\mathbf{R}_{xx})\,E[\tilde{\mathbf{w}}[n]]$$

Let $\mathbf{R}_{xx} = \mathbf{Q}\boldsymbol{\Lambda}\mathbf{Q}^T$ (eigendecomposition), and define the rotated error $\mathbf{v}[n] = \mathbf{Q}^T\tilde{\mathbf{w}}[n]$:

$$E[v_i[n+1]] = (1 - \mu\lambda_i)\,E[v_i[n]]$$

Convergence condition: $|1 - \mu\lambda_i| < 1$ for all $i$, i.e., $0 < \mu < 2/\lambda_{\max}$.

The convergence speed is determined by the slowest mode: $\tau_i = -1/\ln(1-\mu\lambda_i) \approx 1/(\mu\lambda_i)$. The slowest mode corresponds to $\lambda_{\min}$, so the larger the eigenvalue spread $\chi = \lambda_{\max}/\lambda_{\min}$, the slower the convergence.

Steady-state excess MSE:

$$J_{\text{excess}} \approx \mu \cdot M \cdot \sigma_{e_o}^2 / 2 = \text{misadjustment} \times J_{\min}$$

Misadjustment $\mathcal{M} \approx \mu\,M\,\sigma_x^2 / 2 = \mu\,\text{tr}(\mathbf{R}_{xx})/2$. $\;\blacksquare$

# Python: LMS adaptive noise cancellation import numpy as np def lms(x, d, mu=0.01, M=32): """ x: reference signal (correlated noise) d: desired (signal + noise) mu: step size M: filter length Returns: e[n] = clean signal estimate """ N = len(x) w = np.zeros(M) e = np.zeros(N) for n in range(M, N): x_vec = x[n-M:n][::-1] y = w @ x_vec e[n] = d[n] - y w += mu * e[n] * x_vec return e, w

How to Use: Noise Cancellation System Design

Scenario: The microphone captures a signal $d[n] = s[n] + v[n]$ (speech + noise), and a reference microphone captures a correlated copy of the noise $x[n]$.

Step 1: Initialization

$\mathbf{w}[0] = \mathbf{0}$, choose $M = 32$ (based on the impulse response length of the noise path), $\mu = 0.01$

Step 2: For each time step $n$

(a) Assemble the input vector $\mathbf{x}[n] = [x[n], x[n-1], \ldots, x[n-31]]^T$

(b) Compute the noise estimate $\hat{v}[n] = \mathbf{w}^T[n]\,\mathbf{x}[n]$

(d) Update $\mathbf{w}[n+1] = \mathbf{w}[n] + \mu\,e[n]\,\mathbf{x}[n]$

Step 3: Monitor convergence

Observe the sliding average of $|e[n]|^2$ (learning curve) and verify that the MSE decreases to steady state. If it does not decrease or oscillates, reduce $\mu$.

Applications

Application	$d[n]$	$x[n]$	Algorithm
Acoustic Echo Cancellation (AEC)	Near-end mic (speech + echo)	Far-end speech	NLMS / PBFDAF
Active Noise Control (ANC)	Error microphone	Reference microphone	FxLMS
Channel Equalization	Received signal	Training sequence	LMS / RLS
System Identification	System output	System input	LMS / NLMS
Beamforming	Array microphone signals	Desired direction reference	LMS / MVDR

Pitfalls and Limitations

1. Reference signal must be correlated with noise: If $x[n]$ is uncorrelated with the noise component in $d[n]$, LMS cannot learn any useful mapping. In ANC, the reference microphone must be physically close to the noise source.

2. Tracking lag for non-stationary signals: LMS needs time to track environmental changes (approximately $10/(\mu\lambda_{\min})$ samples). If the environment changes too fast, the filter may never catch up.

3. Eigenvalue spread problem: When the condition number $\chi = \lambda_{\max}/\lambda_{\min}$ of $\mathbf{R}_{xx}$ is large (e.g., for speech signals, $\chi$ can exceed 100), the convergence speeds of different LMS modes vary dramatically. NLMS partially addresses this; RLS fully solves it but at higher computational cost.

Practical Advice: Start with NLMS ($\tilde{\mu} = 0.5$, $\delta = 10^{-6}$). If convergence is too slow, switch to frequency-domain adaptive filtering (FBLMS) or RLS.

Quick Check

Q1: What happens if the LMS step size $\mu$ is increased 10x?

Answer

Convergence speed increases roughly 10x (within the stable range), but the steady-state excess MSE also increases 10x ($\mathcal{M} \propto \mu$). If $\mu$ exceeds the stability bound $2/(M\sigma_x^2)$, the algorithm diverges — weights grow exponentially and the output explodes. Therefore, the choice of $\mu$ is a trade-off between convergence speed vs. steady-state accuracy.

Q2: What is the key improvement of NLMS over LMS? What is the cost?

Answer

NLMS normalizes the step size at each step by the energy of the input vector $\|\mathbf{x}[n]\|^2$, so that the effective step size automatically adapts to input power — shrinking when power is high (preventing divergence) and growing when power is low (accelerating convergence). The cost is minimal: one extra inner product per step ($O(M)$, same order as LMS), plus a small regularization constant $\delta$ to prevent division by zero. NLMS is far more practical than LMS and is the preferred choice for industrial applications.

Interactive: LMS Noise Cancellation Experiment

A speech signal contaminated by noise; the LMS filter attempts to cancel the noise. Adjust the step size $\mu$ and filter length $M$ to observe convergence speed and noise reduction performance.

Step size μ: 0.010 Filter length M: 16

Top: gray = noisy signal $d[n]$, blue = LMS output $e[n]$ (after noise cancellation). Bottom: learning curve (MSE vs. iterations). Try making $\mu$ too large to see divergence!

References

[1] S. Haykin, Adaptive Filter Theory, 5th ed., Prentice Hall, 2014.
[2] B. Widrow & M. E. Hoff, "Adaptive switching circuits," IRE WESCON Conv. Rec., 1960.
[3] B. Widrow & S. D. Stearns, Adaptive Signal Processing, Prentice Hall, 1985.
[4] A. H. Sayed, Adaptive Filters, Wiley, 2008.

← Prev: Numerical Precision Next: Module 5 →

5.1 Decimation & Interpolation

Two fundamental tools for changing the sampling rate — but carelessness creates ghosts

Learning Objectives

Understand how decimation causes spectral compression and aliasing in the frequency domain
Understand why interpolation produces spectral images in the frequency domain
Master the design principles for anti-aliasing and anti-imaging low-pass filters
Implement arbitrary sample-rate conversion using rational ratio $L/M$

One-Sentence Summary

Decimation keeps only every $M$th sample, compressing the spectrum $M$-fold and producing $M$ overlapping copies; interpolation inserts $L-1$ zeros between each pair of samples, narrowing the spectrum to $\pi/L$ but creating $L-1$ spectral images. Both require a low-pass filter for cleanup.

Why Learn This?

Ronald Crochiere & Lawrence Rabiner (1981) systematized the theoretical framework of multirate processing in their classic book Multirate Digital Signal Processing. Their motivation was practical: digital telephone systems used different sampling rates for different standards (8 kHz, 16 kHz, 44.1 kHz, 48 kHz), and devices needed seamless conversion. Before this, converting sampling rates almost always required D/A followed by A/D — poor quality and high cost. Multirate theory demonstrated that sample-rate conversion can be performed entirely in the digital domain, fundamentally transforming the communications and audio industries.

Previously...

Before entering multirate processing, review these key concepts:

Sampling Theorem: The sampling rate $f_s$ must be greater than twice the signal's maximum frequency $f_{\max}$; otherwise aliasing occurs. The spectrum repeats with period $f_s$
DTFT (Discrete-Time Fourier Transform): The spectrum of a discrete signal is a $2\pi$-periodic function, with $\omega = \pi$ corresponding to $f_s/2$ (Nyquist frequency)
Low-Pass Filter (LPF): Passes only components below the cutoff frequency $\omega_c$ and suppresses high-frequency content

Pain Point: Why Can't You Simply Discard or Insert Samples?

You might think lowering the sampling rate is simple — just keep every few samples, right? Raising the sampling rate — just insert zeros in between, right?

Directly discarding samples → aliasing: Naively taking every 5th sample of CD-quality music (44.1 kHz) to get 8.82 kHz causes all frequency components above 4.41 kHz to fold back into low frequencies, producing harsh distortion
Directly inserting zeros → imaging: After inserting zeros between samples, multiple "mirror" copies of the original spectrum appear, sounding like metallic high-frequency noise
Non-integer ratios → more complex: Converting from CD (44.1 kHz) to DAT (48 kHz) has a ratio of $160/147$ — neither integer upsampling nor integer downsampling

Core Lesson: Changing the sampling rate = changing the "scaling" and "periodicity" of the spectrum. Without filters to manage these changes, artifacts are inevitable.

Origin

The need for multirate processing arose from the development of digital telephone switching systems in the 1970s. ITU-T standards specified 8 kHz for telephone speech, 16 kHz for wideband speech, and 44.1 or 48 kHz for music. The initial solution was: digital → analog → resample → digital, which was expensive and accumulated noise.

Research by Crochiere and Rabiner at Bell Labs demonstrated that through the combination of integer upsampling + low-pass filtering + integer downsampling, any rational ratio $L/M$ sample-rate conversion can be performed entirely in the digital domain, and is theoretically lossless.

Core Concepts

Decimation by $M$: Intuition

Imagine you shot a 120fps slow-motion video and now want to convert it to 30fps. You keep every 4th frame. If the original video contains fast hand-waving (high frequency), 30fps may not represent it correctly — hand positions will jump or overlap (aliasing). So before discarding frames, you need to apply "motion blur" (low-pass filtering).

Time-domain operation:

Decimation

$$x_d[n] = x[nM]$$

Keep every $M$th sample; the sampling rate drops from $f_s$ to $f_s/M$

Frequency-domain effect:

$$X_d(e^{j\omega}) = \frac{1}{M}\sum_{k=0}^{M-1} X\!\left(e^{j(\omega-2\pi k)/M}\right)$$

The spectrum is compressed $M$-fold, with $M$ shifted copies superimposed → aliasing occurs if the spectrum extends beyond $[-\pi/M, \pi/M]$

Anti-aliasing strategy: Before decimation, apply a low-pass filter with cutoff $\omega_c = \pi/M$ to remove components above $f_s/(2M)$. This prevents spectral overlap after compression.

Interpolation by $L$: Intuition

Imagine you have a 100x100 low-resolution image and want to enlarge it to 400x400. The crudest method is to insert zeros (black dots) between pixels, then smooth with a blur filter — this is exactly the principle of upsampling.

Time-domain operation:

Upsampling (Zero Insertion)

$$x_u[n] = \begin{cases} x[n/L], & n = 0, \pm L, \pm 2L, \ldots \\ 0, & \text{otherwise} \end{cases}$$

Insert $L-1$ zeros between each pair of original samples; the sampling rate increases from $f_s$ to $Lf_s$

Frequency-domain effect:

$$X_u(e^{j\omega}) = X(e^{j\omega L})$$

The spectrum is "compressed" to $[0, \pi/L]$, but $L$ repeated images appear over $[0, 2\pi]$

Anti-imaging strategy: After upsampling, apply a low-pass filter with cutoff $\omega_c = \pi/L$ and gain $L$ to remove images while restoring the interpolated amplitude to the correct level.

Rational-Ratio Sample-Rate Conversion $L/M$

Sample-Rate Conversion Flow

$$x[n] \xrightarrow{\uparrow L} \xrightarrow{H(\omega_c = \pi/\max(L,M))} \xrightarrow{\downarrow M} y[n]$$

First upsample by $L$ → low-pass filter (cutoff = smaller of $\pi/L$ and $\pi/M$) → downsample by $M$

Expand: Derivation of the decimation frequency-domain formula

Let $x_d[n] = x[nM]$; its DTFT is:

$$X_d(e^{j\omega}) = \sum_{n=-\infty}^{\infty} x[nM] e^{-j\omega n}$$

Using the identity: for any sequence $x[n]$, we can write

$$\sum_{n} x[nM] e^{-j\omega n} = \frac{1}{M}\sum_{k=0}^{M-1}\sum_{m} x[m] e^{-j(\omega - 2\pi k)m/M}$$

This is because "keeping every $M$th sample" is equivalent to first multiplying by a comb function with period $M$:

$$c[n] = \frac{1}{M}\sum_{k=0}^{M-1} e^{j2\pi kn/M} = \begin{cases}1, & n \equiv 0 \pmod{M}\\0, & \text{otherwise}\end{cases}$$

Therefore $x[n] \cdot c[n]$ retains the samples at $n = 0, M, 2M, \ldots$, and its DTFT is:

$$\frac{1}{M}\sum_{k=0}^{M-1} X\!\left(e^{j(\omega - 2\pi k)/M}\right)$$

This clearly shows: the spectrum after decimation is the sum of $M$ shifted copies of the original spectrum scaled by $M$. When these copies overlap, aliasing occurs.

How to Use: Sample-Rate Conversion in Practice

Example: CD 44.1 kHz → Telephone 8 kHz

Compute the ratio: $8000/44100 = 80/441$. So $L = 80$, $M = 441$ (already in lowest terms)
Upsample by 80: Insert 79 zeros between each pair of samples → intermediate rate = $44100 \times 80 = 3{,}528{,}000$ Hz
Low-pass filter: Cutoff $\omega_c = \pi/\max(80, 441) = \pi/441$, corresponding to $f_c = 3{,}528{,}000/(2 \times 441) = 4{,}000$ Hz
Downsample by 441: Keep every 441st sample → final rate = $3{,}528{,}000/441 = 8{,}000$ Hz

Practical Note: The intermediate rate of 3.528 MHz is excessively high; in practice, Polyphase structures (next section) or multistage cascades are used to avoid this problem.

# Python: Decimation and interpolation import scipy.signal as sig # Decimate by M=4 (auto anti-aliasing filter) y_down = sig.decimate(x, 4, ftype='fir') # Interpolate by L=3 y_up = sig.resample_poly(x, up=3, down=1) # Arbitrary ratio (CD 44100 → phone 8000) y_resampled = sig.resample_poly(x, up=80, down=441)

Interactive: Frequency-Domain Effects of Upsampling and Downsampling

Select decimation or interpolation mode, adjust the factor, toggle the anti-aliasing/anti-imaging filter, and observe changes in the time-domain waveform and spectrum.

Mode:

Factor = 4

Enable low-pass filter

Applications

Audio Sample-Rate Conversion: CD (44.1 kHz) ↔ DVD/Blu-ray (48 kHz) ↔ high-resolution audio (96/192 kHz). All DAW software (Pro Tools, Ableton) includes built-in multirate converters
Digital Communications: Baseband signals use different rates at different processing stages — modulator (high-rate), equalizer (mid-rate), speech encoder (low-rate). Software-defined radio (SDR) makes extensive use of multistage decimation
Image Scaling: Image magnification/reduction is essentially 2D upsampling/downsampling. Photoshop's "Bicubic" interpolation is a 2D low-pass interpolation filter
Biomedical Signals: EEG is commonly sampled at 512 Hz or 1024 Hz, but many analyses focus only on 0-40 Hz brainwaves → decimation to 128 Hz greatly reduces computation

Pitfalls and Limitations

Forgetting to filter before downsampling: This is the most common mistake. Once aliasing occurs, it is irreversible — the folded high-frequency and low-frequency components are permanently mixed together
Upsampling does not "add information": Upsampling 8 kHz speech to 44.1 kHz will not make it sound clearer. You cannot recover frequency components above 4 kHz — that information was lost during the original sampling
Non-integer ratio precision issues: The ratio 44100 → 48000 = $160/147$ requires upsampling by 160 then downsampling by 147, with an intermediate rate of 7.056 MHz. Multistage or Polyphase structures are essential in practice
Group Delay: The FIR anti-aliasing filter (M4B Filter Design) introduces a delay of $(N-1)/2$ samples. In real-time applications, this delay may be unacceptable

Quick Check

Q1: A speech signal sampled at 16 kHz is decimated by a factor of 4. Without an anti-aliasing filter, at what frequency will the original 3 kHz component appear after decimation?

Show answer

The post-decimation sampling rate is $16/4 = 4$ kHz, with a Nyquist frequency of 2 kHz. The original 3 kHz component exceeds Nyquist and folds: $3 \text{ kHz}$ maps to $4 - 3 = 1$ kHz. Therefore, a spurious component appears at 1 kHz in the decimated signal — this is aliasing.

Q2: Why does a sample-rate conversion system use "upsample first, then downsample" ($\uparrow L \to \text{LPF} \to \downarrow M$) rather than "downsample first, then upsample"?

Show answer

If you downsample first, high-frequency components are irreversibly lost due to aliasing. Upsampling first only inserts zeros (producing removable images) without losing any information. After upsampling, a low-pass filter simultaneously removes the images and the components that would alias during the subsequent downsampling. This guarantees mathematically lossless conversion (assuming an ideal low-pass filter).

References: [1] Crochiere, R.E. & Rabiner, L.R., Multirate Digital Signal Processing, Prentice-Hall, 1983. [2] Oppenheim & Schafer, Discrete-Time Signal Processing, 3rd ed., Ch.4. [3] Vaidyanathan, P.P., Multirate Systems and Filter Banks, Prentice-Hall, 1993. [4] Lyons, R., Understanding Digital Signal Processing, 3rd ed., Ch.10.

← Previous 5.2 Polyphase Decomposition →

5.2 Polyphase Decomposition

Don't compute then discard — compute only what you need

Learning Objectives

Understand the computational waste in direct decimation filtering
Master the mathematical derivation of polyphase decomposition and the Noble Identity
Compute the savings that the polyphase structure provides
Understand the dual application of polyphase for interpolation

One-Sentence Summary

The polyphase structure splits a long filter into $M$ short sub-filters, each operating at the lower rate, avoiding the waste of "compute then discard" and achieving a direct $M$-fold reduction in computation.

Why Learn This?

Bellanger, Bonnerot & Coudreuse (1976) first systematically proposed the polyphase filter structure for multirate processing in an IEEE paper. The core insight of this idea transformed the entire communications industry: in a digital receiver, if you need to low-pass filter before decimation, the conventional approach executes all multiply-accumulate operations at the high rate, then discards most of the results. Polyphase rearranges this process, computing only the output samples you actually need. In an era when DSP chip resources were limited (and still are), this $M$-fold speedup is decisive.

Previously...

Decimation flow (previous section): First filter with a LPF with cutoff $\pi/M$, then keep every $M$th sample
FIR convolution: $y[n] = \sum_{k=0}^{N-1} h[k]\, x[n-k]$, requiring $N$ multiply-accumulates per output
Z-transform: $H(z) = \sum_{k} h[k] z^{-k}$, used to analyze the algebraic structure of the filter

Pain Point: Massive Computational Waste

Consider a decimation-by-$M = 4$ system:

First apply FIR filtering to every input sample ($N$ multiply-accumulates per sample)
Then keep only 1 out of every 4 outputs, discarding the other 3
In other words, 75% of the computed results are directly thrown away!

If the filter has $N = 128$ taps and the input rate is 1 MHz:

Direct method: $128 \times 10^6 = 1.28 \times 10^8$ multiplications/second
Of which $3/4$ are wasted → only $3.2 \times 10^7$ multiplications/second are useful

Question: Can we rearrange the computation order so the filter operates only at the low rate ($f_s/M$), fundamentally eliminating the waste?

Origin

The name "polyphase" comes from "multiple phases" — grouping filter coefficients by their phase (delay offset). Conceptually, it is similar to a polyphase power system, where three phases are offset by 120 degrees and combine to deliver stable power.

The key mathematical breakthrough is the Noble Identity (named after B. Noble): it proves that a downsampler can "pass through" a filter, provided that the appropriate variable substitution is made in the filter's Z-transform. This allows us to downsample first, then filter at the lower rate.

Core Concepts

Intuition: Imagine you are a quality inspector on a factory assembly line, checking only 1 out of every 4 products. The traditional approach is "inspect every product fully, then discard 3 out of 4 reports." The polyphase approach is "only inspect the 4th product, but incorporate key data from the previous 3" — the number of inspections drops by 4x, yet the results are identical.

Polyphase Decomposition

Split the $N$-tap filter's coefficients into $M$ groups by residue:

Type-I Polyphase Decomposition

$$H(z) = \sum_{k=0}^{M-1} z^{-k}\, E_k(z^M)$$

where $E_k(z) = \sum_{n} h[nM+k]\, z^{-n}$, $k = 0, 1, \ldots, M-1$

Each $E_k$ contains only every $M$th coefficient of the original filter:

$E_0$: $h[0], h[M], h[2M], \ldots$ (phase 0)
$E_1$: $h[1], h[M+1], h[2M+1], \ldots$ (phase 1)
$E_{M-1}$: $h[M-1], h[2M-1], h[3M-1], \ldots$ (phase $M-1$)

Noble Identity

$$\boxed{H(z^M) \;\text{followed by}\; \downarrow M \;=\; \downarrow M \;\text{followed by}\; H(z)}$$

The downsampler can "pass through" the filter, provided $z$ is replaced by $z^M$

Applying the Noble Identity to a decimation system:

Original: $x[n] \to H(z) \to \downarrow M \to y[n]$ (filtering at the high rate)
Decompose: $H(z) = \sum z^{-k} E_k(z^M)$
Apply the Noble Identity to each branch: downsample first, then filter with $E_k(z)$
Result: all $M$ sub-filters operate at the low rate!

Computation Analysis

	Direct Method	Polyphase
Multiplications per output	$N$	$N$
Output rate	$f_s$ (produced at high rate, then discarded)	$f_s/M$ (produce only what's needed)
Multiplications per second	$N \cdot f_s$	$N \cdot f_s / M$
Speedup	$M$x

Expand: Full derivation of polyphase decomposition

Starting from the Z-transform:

$$H(z) = \sum_{n=0}^{N-1} h[n] z^{-n}$$

Write the index $n$ as $n = qM + k$, where $q = \lfloor n/M \rfloor$ and $k = n \bmod M$:

$$H(z) = \sum_{k=0}^{M-1} \sum_{q=0}^{\lceil N/M\rceil - 1} h[qM+k]\, z^{-(qM+k)}$$ $$= \sum_{k=0}^{M-1} z^{-k} \underbrace{\sum_{q} h[qM+k]\,(z^M)^{-q}}_{E_k(z^M)}$$

Therefore $H(z) = \sum_{k=0}^{M-1} z^{-k} E_k(z^M)$, where each polyphase sub-filter is:

$$E_k(z) = \sum_{q} h[qM+k]\, z^{-q}, \quad k = 0, 1, \ldots, M-1$$

Proof of the Noble Identity:

Let $v[n] = \sum_m g[m] w[n-m]$, where $g$ has Z-transform $G(z^M)$ (i.e., $g[n] \neq 0$ only when $n$ is a multiple of $M$). After downsampling:

$$v_d[n] = v[nM] = \sum_m g[m] w[nM - m]$$

Since $g[m] = 0$ when $m \not\equiv 0 \pmod{M}$, let $m = lM$:

$$v_d[n] = \sum_l g[lM] w[nM - lM] = \sum_l g[lM] w_d[n-l] \cdot (\text{if }w_d[n]=w[nM])$$

This is exactly $w$ downsampled first, then filtered by $G(z)$. QED.

How to Use: Polyphase Decimation Worked Example

Example: 24-tap FIR, Decimation by $M = 4$

Group: Split $h[0] \ldots h[23]$ into 4 groups:
- $E_0$: $h[0], h[4], h[8], h[12], h[16], h[20]$ (6 coefficients)
- $E_1$: $h[1], h[5], h[9], h[13], h[17], h[21]$ (6 coefficients)
- $E_2$: $h[2], h[6], h[10], h[14], h[18], h[22]$ (6 coefficients)
- $E_3$: $h[3], h[7], h[11], h[15], h[19], h[23]$ (6 coefficients)
Downsample input: Downsample each phase of the original input separately → each $E_k$ input rate = $f_s/4$
Sub-filter: Each $E_k$ filters with 6 coefficients → 6 multiplications
Combine: Sum 4 branches → each output uses $4 \times 6 = 24$ multiplications, but at $f_s/4$ rate
Result:
Direct: 24 mults × 1000 samples/sec = 24,000 mults/sec Polyphase: 24 mults × 250 samples/sec = 6,000 mults/sec Speedup: 4x!

Interactive: Direct Method vs. Polyphase Computation Comparison

Adjust the filter length and decimation factor to observe the computational difference.

Filter length $N$ = 32

Decimation factor $M$ = 4

Applications

Software-Defined Radio (SDR): Receiver front-end ADCs sample at several GHz and require multistage decimation to baseband. Every stage uses polyphase structures; otherwise the FPGA simply cannot keep up
Digital Television (DVB): The channelizer simultaneously extracts multiple narrowband channels from a wideband signal and is essentially a polyphase filter bank
Audio Resampling: Open-source libraries like libsamplerate and SoX internally use polyphase for high-quality sample-rate conversion
Radar Pulse Compression: The matched filter + decimation combination uses polyphase for real-time processing of high-speed ADC data on FPGAs

Pitfalls and Limitations

Only works for integer factors: The polyphase structure requires the decimation factor $M$ to be an integer. For rational ratios $L/M$, upsampling and downsampling polyphase structures must be combined
Not applicable to IIR filters: The Noble Identity strictly holds only for causal FIR filters. The recursive structure of IIR makes polyphase decomposition more complex
Filter length must be divisible by $M$: If not, zero-pad to a multiple of $M$ (does not affect frequency response, just adds a few zero coefficients)
Memory access patterns: In hardware implementations, the non-contiguous memory access pattern of polyphase may reduce cache efficiency; careful data layout is needed

Quick Check

Q1: A 64-tap FIR filter is used in an 8x decimation system. How many multiplications per second does each method require? (Assume input rate 10 kHz)

Show answer

Direct: Filter at 10 kHz → $64 \times 10{,}000 = 640{,}000$ multiplications/sec, then discard 7 out of 8 outputs.

Polyphase: 8 sub-filters with $64/8 = 8$ coefficients each, operating at $10{,}000/8 = 1{,}250$ Hz. Each output requires $8 \times 8 = 64$ multiplications, but at a rate of only 1,250 Hz → $64 \times 1{,}250 = 80{,}000$ multiplications/sec. Speedup: $640{,}000/80{,}000 = 8$x, exactly equal to $M$.

Q2: The Noble Identity says "a downsampler can pass through a filter," but why doesn't this work for arbitrary filters? What is the key condition?

Show answer

The precise form of the Noble Identity is: $H(z^M)$ followed by $\downarrow M$ = $\downarrow M$ followed by $H(z)$. Note that the filter on the left is $H(z^M)$ (not $H(z)$). So not just any $H(z)$ can directly pass through the downsampler — you must first decompose $H(z)$ into polyphase form $\sum z^{-k} E_k(z^M)$, then apply the Noble Identity to each $E_k(z^M)$ term separately.

References: [1] Bellanger, M., Bonnerot, G. & Coudreuse, M., Digital Filtering by Polyphase Network: Application to Sample-Rate Alteration and Filter Banks, IEEE Trans. ASSP, 1976. [2] Vaidyanathan, P.P., Multirate Systems and Filter Banks, Ch.4-5, 1993. [3] harris, f.j., Multirate Signal Processing for Communication Systems, Prentice-Hall, 2004. [4] Crochiere & Rabiner, Multirate Digital Signal Processing, Ch.3, 1983.

← 5.1 Decimation & Interpolation 5.3 Filter Banks →

5.3 Filter Banks

Split a signal into subbands, process each independently, and reassemble — a unified framework from MP3 to wavelets

Learning Objectives

Understand the analysis-synthesis architecture of two-channel QMF (Quadrature Mirror Filter)
Master the mathematical derivation of the Perfect Reconstruction (PR) condition
Understand the Alias Cancellation condition
Establish the connection between filter banks and the Discrete Wavelet Transform (DWT): the Mallat algorithm

One-Sentence Summary

A filter bank splits a signal into multiple subbands, decimates each for independent processing, then upsamples and synthesizes back to the original signal. If designed properly, Perfect Reconstruction (PR) is achievable — the output is exactly a delayed version of the input with no distortion whatsoever.

Why does this matter

Esteban & Galand (1977) proposed the Quadrature Mirror Filter (QMF) for sub-band speech coding at the ICASSP conference, inaugurating the era of subband processing. Their motivation: different frequency bands of speech have different perceptual importance; by processing them separately, more bits can be allocated to important bands and fewer to unimportant ones — the seed of Perceptual Coding. The MDCT (Modified Discrete Cosine Transform) used in MP3 is essentially a perfect reconstruction filter bank, and the DWT used in JPEG 2000 is as well. It is fair to say that half of modern compression technology is built on filter bank theory.

Previously...

Decimation and interpolation (Section 5.1): Frequency-domain effects of decimation by 2 — spectral compression + aliasing
Polyphase structure (Section 5.2): How to efficiently implement the "filter + decimate" combination
Low-pass/high-pass filters: $H_0(z)$ low-pass retains $[0, \pi/2]$, $H_1(z)$ high-pass retains $[\pi/2, \pi]$

Pain Point: The Dilemma of Subband Processing

Many applications require frequency-band-specific processing:

Audio compression: The ear is most sensitive to 1-4 kHz and insensitive to 16+ kHz → encode bands separately
Noise removal: Noise may exist only in certain bands → process only those bands
Equalizer: Independently adjust bass, mid, and treble

But if the bands need to be reassembled afterward, a problem arises:

Core Challenge: Decimation produces aliasing; upsampling produces imaging. During the analysis-synthesis round-trip, will these artifacts accumulate? Is there a way to make them cancel completely?

Origin

The QMF story: Esteban and Galand's original QMF design could only achieve "approximate" perfect reconstruction — aliasing could be completely eliminated, but some amplitude distortion remained.

Smith & Barnwell (1984) and Mintzer (1985) independently discovered true perfect reconstruction two-channel filter banks (PR-QMF), also known as Conjugate Quadrature Filters (CQF).

Stephane Mallat (1989) revealed a profound connection: repeatedly applying two-channel analysis to the low-pass output = the Discrete Wavelet Transform (DWT). This is the Mallat algorithm, which transformed abstract wavelet theory into efficiently computable filter bank operations, directly catalyzing the explosion of wavelets in image compression, denoising, and beyond.

Core Concepts

Intuition: Imagine traffic on a highway (the signal) reaching a toll station where it splits into two lanes (low-frequency/high-frequency). Each lane is tolled independently (processed), then merges back onto the same road. If the splitting and merging mechanisms are well designed, the exiting traffic is identical to the entering traffic (perfect reconstruction). If poorly designed, some cars get lost or duplicated (distortion).

Two-Channel Analysis-Synthesis System

The structure is as follows:

Analysis: x[n] ──┬── H0(z) ── ↓2 ── v0[n] (low-freq subband) └── H1(z) ── ↓2 ── v1[n] (high-freq subband) Synthesis: v0[n] ── ↑2 ── F0(z) ──┬── y[n] v1[n] ── ↑2 ── F1(z) ──┘

Analysis-synthesis equation:

Z-Transform of the Output

$$Y(z) = \underbrace{\tfrac{1}{2}\bigl[F_0(z)H_0(z) + F_1(z)H_1(z)\bigr]}_{\text{Transfer function } T(z)} X(z) + \underbrace{\tfrac{1}{2}\bigl[F_0(z)H_0(-z) + F_1(z)H_1(-z)\bigr]}_{\text{Alias term } A(z)} X(-z)$$

Two design objectives:

Alias Cancellation: Set $A(z) = 0$, i.e., $F_0(z)H_0(-z) + F_1(z)H_1(-z) = 0$.
A simple solution: $F_0(z) = H_1(-z)$, $F_1(z) = -H_0(-z)$
Perfect Reconstruction: Set $T(z) = cz^{-d}$ (pure delay), i.e., $$F_0(z)H_0(z) + F_1(z)H_1(z) = 2cz^{-d}$$

QMF design

The classical QMF choice is $H_1(z) = H_0(-z)$ (mirror relation), i.e. $h_1[n] = (-1)^n h_0[n]$.

Meaning: if $H_0$ is a low-pass, then $H_0(-z)$ flips the frequency axis and automatically becomes a high-pass. This is the origin of the name "Quadrature Mirror"—the high-pass is the mirror image of the low-pass.

Link to the DWT: the Mallat algorithm

Feed the low-pass output $v_0[n]$ of a two-channel analysis back into another $(H_0, H_1)$ + $\downarrow 2$ stage:

Level 1: x[n] → H0 ↓2 → a1[n] (approximation coefficients) → H1 ↓2 → d1[n] (detail coefficients) Level 2: a1[n] → H0 ↓2 → a2[n] → H1 ↓2 → d2[n] Level 3: a2[n] → H0 ↓2 → a3[n] → H1 ↓2 → d3[n] ...

This is exactly the Discrete Wavelet Transform (DWT)! Each level extracts details at a different scale. $H_0/H_1$ are the filters associated with the wavelet's scaling function / mother wavelet.

Show full derivation: the perfect-reconstruction condition

In a two-channel system, the combined effect of $\downarrow 2$ followed by $\uparrow 2$ is:

$$(\uparrow 2 \circ \downarrow 2)\{v\}(z) = \frac{1}{2}\bigl[V(z^{1/2}) + V(-z^{1/2})\bigr] \bigg|_{z \to z^2}$$

More explicitly, let $v_0[n]$ be the result of filtering $x[n]$ with $H_0$ and then $\downarrow 2$:

$$V_0(z) = \frac{1}{2}\bigl[H_0(z^{1/2})X(z^{1/2}) + H_0(-z^{1/2})X(-z^{1/2})\bigr]$$

(using the frequency-domain formula for $\downarrow 2$). Upsample again and pass through $F_0$:

$$U_0(z) = F_0(z) \cdot \frac{1}{2}\bigl[H_0(z)X(z) + H_0(-z)X(-z)\bigr]$$

Similarly $U_1(z) = F_1(z) \cdot \frac{1}{2}[H_1(z)X(z) + H_1(-z)X(-z)]$.

Synthesis output:

$$Y(z) = U_0(z) + U_1(z) = T(z)X(z) + A(z)X(-z)$$

where:

$$T(z) = \frac{1}{2}[F_0(z)H_0(z) + F_1(z)H_1(z)]$$ $$A(z) = \frac{1}{2}[F_0(z)H_0(-z) + F_1(z)H_1(-z)]$$

Perfect reconstruction requires: $A(z) = 0$ (no aliasing) and $T(z) = cz^{-d}$ (pure delay).

If we take $H_1(z) = H_0(-z)$, $F_0(z) = H_0(z)$, $F_1(z) = -H_0(-z)$, then:

$$A(z) = \frac{1}{2}[H_0(z)H_0(-z) - H_0(-z)H_0(z)] = 0 \checkmark$$ $$T(z) = \frac{1}{2}[H_0^2(z) - H_0^2(-z)]$$

PR requires $T(z) = cz^{-d}$, which imposes non-trivial constraints on the coefficients of $H_0$ (power complementarity). Daubechies wavelets are famous solutions that satisfy this condition.

How to Use: designing a two-channel filter bank

Choose a prototype low-pass filter $H_0(z)$: design a half-band FIR low-pass with cutoff at $\pi/2$, e.g. using Daubechies coefficients or a CQF design.
Derive the high-pass from the mirror relation: $H_1(z) = H_0(-z)$, i.e. $h_1[n] = (-1)^n h_0[n]$ (leave the even coefficients of the low-pass unchanged and flip the sign of the odd coefficients).
Set the synthesis filters: $F_0(z) = H_0(z)$, $F_1(z) = -H_1(z)$ (or adjust according to the PR condition).
Verify PR: compute $T(z)$ to confirm it is a pure delay and compute $A(z)$ to confirm it is zero.
(For a DWT) iterate: feed $v_0[n]$ back into the same analysis filters and repeat for the desired number of levels.

Numerical example: Daubechies db2 (4-tap): $h_0 = [0.4830, 0.8365, 0.2241, -0.1294]$. You can verify that $T(z) = z^{-3}$ (3-sample delay) and the reconstruction is perfect.

Interactive: two-channel filter bank analysis and reconstruction

See how the input signal is split into low- and high-frequency subbands, and compare perfect reconstruction against non-perfect reconstruction.

Perfect Reconstruction

Applications

Speech coding (G.722): the ITU-T G.722 standard uses a two-channel QMF to split 7 kHz speech into low and high bands, coded separately with ADPCM. The low band receives more bits (most speech energy) and the high band fewer.
MP3 audio compression: uses a 32-channel MDCT (Modified Discrete Cosine Transform) filter bank, essentially a type of cosine-modulated filter bank. Each subband is allocated a different number of quantization bits based on a psychoacoustic model.
JPEG 2000 image compression: uses CDF 9/7 or Le Gall 5/3 biorthogonal wavelet filter banks for multi-level 2D subband decomposition of images. Better suited than the DCT of JPEG for handling image detail at different resolutions.
Acoustic echo cancellation (AEC): subband adaptive filters converge independently in each band, converging faster and using less computation than full-band algorithms.

Pitfalls and Limitations

Classical QMF cannot achieve true PR: the original Esteban-Galand QMF can only cancel aliasing while leaving residual amplitude distortion. True PR requires CQF/PR-QMF designs.
Linear phase vs. PR conflict: orthogonal filter banks (e.g. Daubechies) achieve PR but not linear phase. Biorthogonal filter banks can achieve both, but analysis and synthesis filters differ.
Leakage between bands: an ideal brick-wall split is impossible with finite-length filters. There is always leakage in the transition band, affecting independent subband processing.
Delay: a PR filter bank introduces a delay $d$ equal to the combined length of analysis + synthesis filters minus one. In real-time communications this delay may exceed the acceptable range.

Quick Check

Q1: Why does $H_1(z) = H_0(-z)$ turn a low-pass into a high-pass? Explain in the frequency domain.

Show answer

$H_0(-z)$ is equivalent to replacing $\omega$ with $\omega + \pi$ in $H_0(e^{j\omega})$: $$H_1(e^{j\omega}) = H_0(e^{j(\omega+\pi)})$$ This shifts the frequency axis by $\pi$: the passband originally at $\omega = 0$ (low frequencies) moves to $\omega = \pi$ (high frequencies), and the stopband originally at $\omega = \pi$ moves to $\omega = 0$. So a low-pass becomes a high-pass, and the two frequency responses are mirror images — that is the meaning of "Quadrature Mirror."

Q2: With each extra level of decomposition in the Mallat algorithm, what happens to the sample rate and the bandwidth of the low-frequency subband?

Show answer

Each level is a $\downarrow 2$, so for every additional level: sample rate halves and bandwidth halves.
Level-$j$ low subband: sample rate = $f_s/2^j$, bandwidth = $[0, f_s/2^{j+1}]$.
For example with $f_s = 8$ kHz and 3 levels of decomposition:
Level 1 low band: $f_s/2 = 4$ kHz, bandwidth $[0, 2\text{ kHz}]$
Level 2 low band: $f_s/4 = 2$ kHz, bandwidth $[0, 1\text{ kHz}]$
Level 3 low band: $f_s/8 = 1$ kHz, bandwidth $[0, 0.5\text{ kHz}]$

References: [1] Esteban, D. & Galand, C., Application of Quadrature Mirror Filters to Split Band Voice Coding Schemes, ICASSP, 1977. [2] Smith, M.J.T. & Barnwell, T.P., Exact Reconstruction Techniques for Tree-Structured Subband Coders, IEEE Trans. ASSP, 1986. [3] Mallat, S., A Theory for Multiresolution Signal Decomposition, IEEE Trans. PAMI, 1989. [4] Vaidyanathan, P.P., Multirate Systems and Filter Banks, Ch.5-6, 1993. [5] Strang, G. & Nguyen, T., Wavelets and Filter Banks, Wellesley-Cambridge, 1996.

← 5.2 Polyphase Decomposition 5.4 Sigma-Delta ADC →

5.4 Sigma-Delta ADC (Oversampling & Noise Shaping)

1-bit quantizer + oversampling + noise shaping = the magic of 24-bit resolution

Learning Objectives

Understand how oversampling spreads quantization noise, improving in-band SNR
Master the feedback principle of noise shaping and the Noise Transfer Function (NTF)
Compute the SNR improvement for different modulator orders
Understand the complete Sigma-Delta ADC system: modulator → digital low-pass → M5A Decimation

One-Sentence Summary

A Sigma-Delta ($\Sigma\Delta$) ADC samples with the crudest quantizer in the world (a 1-bit comparator) at a very high rate, then uses a feedback loop to "push" the quantization noise out of the signal band, and finally recovers a high-resolution (16-24 bit) output with a digital filter and decimation. It trades the "precision" problem for a "speed" problem.

Why does this matter

Inose, Yasuda & Murakami (1962) first proposed Delta-Sigma modulation (originally an improvement to Delta Modulation) at the University of Tokyo. James Candy (1974) further developed the theoretical basis of oversampling ADCs at Bell Labs. However, Sigma-Delta ADCs did not really take off until CMOS VLSI processes matured in the 1990s—because they only need a comparator (1-bit quantizer) and some digital logic, and are implemented almost entirely in digital circuits, perfectly aligned with CMOS scaling trends.

Today, almost every consumer-electronics audio ADC/DAC is a Sigma-Delta architecture—your phone, headphones and sound cards all contain one. Understanding how it achieves 24-bit resolution from 1-bit + oversampling + noise shaping is one of the most elegant examples of trading speed for precision in DSP.

Previously

Quantization noise: a $B$-bit uniform quantizer has SQNR = $6.02B + 1.76$ dB. A 1-bit quantizer gives SQNR $\approx 7.78$ dB (dreadful).
Power spectral density (PSD): under the additive white-noise model, quantization noise is uniformly distributed over $[0, f_s/2]$ with PSD = $\Delta^2/(12 f_s)$.
Decimation + low-pass filtering (section 5.1): first filter out the out-of-band noise, then decimate to the target rate.

The Problem: the cost of high-resolution ADCs

Traditional Nyquist-rate ADCs (SAR, Pipeline) face severe challenges to reach high resolution:

16-bit ADC: the comparator must distinguish $V_{\text{ref}}/65536$ volts. With a 3.3 V reference that is $50\,\mu\text{V}$—smaller than on-chip thermal noise!
Resistor/capacitor matching: pipeline ADCs need 0.001% component matching, leading to low yield and high cost.
Power: high-speed, high-precision ADCs can dissipate several watts, unsuitable for mobile devices.

Fundamental tension: CMOS processes are good at making "fast but crude" digital circuits, not "slow but precise" analog ones. Is there a way to use a "fast but crude" 1-bit quantizer and still get a "slow but precise" result?

Origin

Evolution of Sigma-Delta:

Delta Modulation (1946, Deloraine): a 1-bit quantizer encoding the signal's "difference." Problems: slope overload and granular noise.
Delta-Sigma Modulation (1962, Inose et al.): an integrator placed before Delta Modulation. The integrator causes quantization noise to be differentiated (high-pass shaped), greatly reducing in-band noise.
Higher-order modulators (1980s-90s): cascading multiple integrators; each extra order pushes noise further away. Stability becomes a key challenge.

Naming dispute: the original paper called it "Delta-Sigma" ($\Delta\Sigma$) because the difference (Delta) comes first and then the integration (Sigma). The industry generally calls it "Sigma-Delta" ($\Sigma\Delta$) because in the block diagram the integrator comes first. Both names refer to the same thing.

Core Concepts

Step 1: oversampling

Intuition: imagine sprinkling a fistful of sand (quantization noise) uniformly on a wall. The total amount of sand is fixed, but if you make the wall $R$ times larger (expanded bandwidth), the sand per square centimeter drops by a factor of $R$. If you only look at a small middle portion of the wall (the signal band), there is far less sand (noise).

SNR improvement from oversampling

$$\text{SNR}_{\text{oversampled}} = \text{SQNR} + 10\log_{10}(R) \quad \text{dB}$$

$R = f_s / (2f_b)$ is the Oversampling Ratio (OSR); $f_b$ is the signal bandwidth.

Every doubling of the oversampling ratio reduces in-band noise by 3 dB, an effective gain of only 0.5 bit of resolution. That is very inefficient—256x oversampling only buys 4 bits.

Step 2: noise shaping

Intuition: what if, instead of just spreading the sand more uniformly, you used a broom to sweep the middle sand to the edges? The middle (signal band) is almost clean, and the edges (out-of-band, high frequency) are piled up—and you are going to chop off the edges with a low-pass filter anyway.

Structure of a first-order Sigma-Delta modulator:

┌────────────────────┐ x[n] ──(+)── integrator ── 1-bit quantizer ──┬── y[n] (1-bit output) ↑(-) │ └──── DAC (1-bit) ───────────────────┘

Linearized model (treat the quantizer as adding quantization noise $e[n]$):

First-order Sigma-Delta output

$$Y(z) = \underbrace{z^{-1}}_{\text{STF}} X(z) + \underbrace{(1 - z^{-1})}_{\text{NTF}} E(z)$$

STF = Signal Transfer Function (signal is delayed by one sample); NTF = Noise Transfer Function (noise is high-pass shaped).

Frequency response of the NTF:

$$|NTF(e^{j\omega})| = |1 - e^{-j\omega}| = 2\left|\sin\frac{\omega}{2}\right|$$

Zero at $\omega = 0$ (DC, low frequency), 2 at $\omega = \pi$ (Nyquist) → noise is pushed to high frequencies.

$L$-th order noise shaping

Cascading $L$ integrators (an $L$-th order modulator):

$L$-th order NTF

$$NTF_L(z) = (1 - z^{-1})^L$$

$L$-th order differentiation → noise at low frequencies is suppressed even more strongly.

SNR of $L$-th order + $R$x oversampling

$$\text{SNR} \approx 6.02B + 1.76 + (2L+1) \cdot 10\log_{10}(R) - 10\log_{10}\!\left(\frac{\pi^{2L}}{2L+1}\right) \;\text{dB}$$

For a 1-bit converter ($B=1$), each doubling of OSR gains $(2L+1) \times 3$ dB.

Key comparison:
Plain oversampling: $2\times$ OSR → +3 dB (+0.5 bit)
1st-order shaping: $2\times$ OSR → +9 dB (+1.5 bit)
2nd-order shaping: $2\times$ OSR → +15 dB (+2.5 bit)
4th-order shaping: $2\times$ OSR → +27 dB (+4.5 bit)
4th-order + 256x OSR → $\approx 27 \times 8 = 216$ dB theoretical → roughly 120 dB (20 bit) in practice.

Step 3: digital filtering + decimation

The modulator output is a high-rate 1-bit bitstream; it passes through:

Digital low-pass filter (cutoff $f_b$) → removes the quantization noise that was shaped to high frequencies
Decimation (factor $R$) → returns to the target sample rate $2f_b$
Produces a high-resolution (16-24 bit) output

This "digital filter + decimation" is usually implemented as a multi-stage CIC (Cascaded Integrator-Comb) filter followed by half-band filters — using exactly the techniques from sections 5.1-5.2!

Show full derivation: transfer function of a first-order Sigma-Delta modulator

Let the integrator have transfer function $\frac{z^{-1}}{1-z^{-1}}$ (discrete integrator), and write the quantizer output as $y[n] = u[n] + e[n]$, where $u[n]$ is the quantizer input and $e[n]$ is the quantization error.

Loop equations in the Z-domain:

$$U(z) = \frac{z^{-1}}{1-z^{-1}}\bigl[X(z) - Y(z)\bigr]$$ $$Y(z) = U(z) + E(z)$$

Substituting the first into the second:

$$Y(z) = \frac{z^{-1}}{1-z^{-1}}[X(z) - Y(z)] + E(z)$$ $$Y(z)\left[1 + \frac{z^{-1}}{1-z^{-1}}\right] = \frac{z^{-1}}{1-z^{-1}}X(z) + E(z)$$ $$Y(z) \cdot \frac{1}{1-z^{-1}} = \frac{z^{-1}}{1-z^{-1}}X(z) + E(z)$$ $$Y(z) = z^{-1}X(z) + (1-z^{-1})E(z)$$

Therefore STF = $z^{-1}$ (the signal is delayed by one sample, distortion-free) and NTF = $(1-z^{-1})$ (quantization noise is shaped by a first-order high-pass).

In-band noise power:

$$\sigma_{\text{in-band}}^2 = \frac{\sigma_e^2}{f_s}\int_0^{2\pi f_b} |1-e^{-j2\pi f/f_s}|^2 df \approx \sigma_e^2 \cdot \frac{\pi^2}{3R^3}$$

where $R = f_s/(2f_b)$ is the OSR. Compared with the unshaped $\sigma_e^2/R$, first-order shaping provides an additional $\pi^2/(3R^2)$ reduction.

How to Use: Sigma-Delta ADC design calculation

Example: audio ADC (20 kHz bandwidth, 16-bit target)

Target SNR: 16 bit → $6.02 \times 16 + 1.76 = 98.1$ dB
Architecture choice: 3rd-order modulator + 128x oversampling
- 1-bit SQNR = 7.78 dB
- OSR gain = $(2 \times 3 + 1) \times 10\log_{10}(128) = 7 \times 21.1 = 147.5$ dB
- Noise-shaping correction = $-10\log_{10}(\pi^6/7) \approx -12.7$ dB
- Theoretical SNR $\approx 7.78 + 147.5 - 12.7 = 142.6$ dB (23.4 bit)
- With non-ideal effects (op-amp limits, clock jitter, DAC nonlinearity) attenuated by 20-30 dB → actual roughly 112-122 dB (18-20 bit) ✓
Sample rate: $f_s = 2 \times 20{,}000 \times 128 = 5.12$ MHz. A 1-bit comparator handles this speed easily.
Digital filter: 4th-order CIC ($R = 16$) → 3 stages of half-band FIR ($\downarrow 2$) → FIR compensator → final $128\times$ decimation to 40 kHz.

Concrete numbers at a glance

Configuration	Theoretical SNR	Equivalent bits	Notes
1-bit, no oversampling	7.78 dB	1.0 bit	Bare comparator
1-bit, 256x OSR, no shaping	$7.78 + 24.1 = 31.9$ dB	~5 bit	Best you can do with oversampling alone
1-bit, 256x OSR, 1st-order	~79 dB	~13 bit	Noise shaping starts to pay off
1-bit, 256x OSR, 2nd-order	~127 dB	~21 bit	Very high in theory
1-bit, 256x OSR, 4th-order	~224 dB	~37 bit (theory)	Limited by non-idealities in practice, ~120 dB (20 bit)

Interactive: effects of oversampling and noise shaping

Adjust the oversampling ratio and the noise-shaping order and observe how the quantization-noise power spectral density changes, along with the equivalent SNR and bits.

Oversampling ratio OSR = 64x

Noise-shaping order $L$ = 1

Applications

Audio ADC/DAC: from mobile phones to recording studios, almost every audio converter uses Sigma-Delta. The AKM AK5397 (32-bit/768 kHz) and ESS ES9038PRO are high-end examples.
Precision measurement: 24-bit Sigma-Delta ADCs (e.g. ADS1256) are used for weighing, temperature measurement and strain gauges. Very high resolution + low speed = the sweet spot for Sigma-Delta.
Sensor interfaces: MEMS accelerometers and gyroscopes use Sigma-Delta extensively to convert tiny capacitance changes into digital values.
Digital radio receivers: a wideband Sigma-Delta ADC + digital downconversion can digitize an RF band directly (bandpass $\Sigma\Delta$).
Class-D audio amplifiers: essentially Sigma-Delta DACs—driving a speaker with PWM switches, with noise shaping ensuring very low distortion in the audible range.

Pitfalls and Limitations

Stability of high-order modulators: single-loop modulators of 3rd order or higher can become unstable (the quantizer's nonlinearity breaks the linear-analysis assumption). Remedies: MASH (Multi-stAge noise SHaping) structures or feedforward architectures.
Latency: the digital decimation filters (especially multi-stage CIC + FIR) introduce significant delay, which may be unacceptable for real-time control applications.
Bandwidth limit: high OSR demands a very high sample rate. 20 kHz audio × 256 = 5.12 MHz is fine, but wideband communications (tens of MHz) need ultra-fast comparators.
Tones: low-order Sigma-Delta modulators can produce periodic patterns (idle tones) for DC or low-frequency inputs, which sound like a hum. Dither or higher-order modulators are needed to eliminate them.
DAC nonlinearity (multi-bit Sigma-Delta): nonlinearity in the feedback DAC injects directly into the signal path. Multi-bit internal quantizers improve performance but demand very linear DACs. Remedy: DEM (Dynamic Element Matching).

Quick Check

Q1: A 2nd-order Sigma-Delta modulator uses 64x oversampling. How many dB does SNR improve per doubling of OSR? How many equivalent bits?

Show answer

For $L = 2$, SNR improvement = $(2L+1) \times 3 = 5 \times 3 = 15$ dB per octave of OSR.

Equivalent bits gained = $15/6.02 \approx 2.5$ bit per $2\times$ OSR.

So at 64x OSR ($= 2^6$): an increase of $6 \times 15 = 90$ dB ($\approx 15$ bit) above the baseline SQNR, and combined with the 1-bit baseline and shaping correction, the theoretical figure reaches about 85 dB ($\approx 14$ bit).

Q2: Why is the "digital filter + decimation" stage of a Sigma-Delta ADC exactly the place where the multirate techniques we learned earlier (polyphase, CIC) are used?

Show answer

The modulator outputs a very high-rate (e.g. 5.12 MHz) 1-bit bitstream. Turning it into a low-rate (e.g. 40 kHz) multi-bit output requires 128x decimation.

Running a single FIR at 5.12 MHz and then decimating by 128 wastes 99.2% of the computation. In practice we therefore use:

CIC filter: no multipliers, only adders and delays—naturally suited to the first high-rate decimation stage.
Polyphase half-band filters: for the subsequent $\downarrow 2$ stages, with a polyphase structure avoiding wasted computation.

This is the ideal real-world application of the techniques from sections 5.1-5.2.

References: [1] Inose, H., Yasuda, Y. & Murakami, J., A Telemetering System by Code Modulation: Delta-Sigma Modulation, IRE Trans., 1962. [2] Candy, J.C., A Use of Limit Cycle Oscillations to Obtain Robust Analog-to-Digital Converters, IEEE Trans., 1974. [3] Norsworthy, S.R., Schreier, R. & Temes, G.C., Delta-Sigma Data Converters: Theory, Design, and Simulation, IEEE Press, 1997. [4] Schreier, R. & Temes, G.C., Understanding Delta-Sigma Data Converters, 2nd ed., IEEE/Wiley, 2017. [5] Aziz, P.M., Sorensen, H.V. & van der Spiegel, J., An Overview of Sigma-Delta Converters, IEEE SP Magazine, 1996.

← 5.3 Filter Banks Module 6 →

8A Random Processes & Wide-Sense Stationarity

Real signals always have randomness — the bridge between statistics and spectra

Learning Objectives

Understand the definition of a random process: a collection of random variables indexed by time
Distinguish between Strict-Sense Stationary (SSS) and Wide-Sense Stationary (WSS) definitions and their practicality
Master the properties and physical meaning of the autocorrelation function $R_{xx}[m]$
Derive and apply the Wiener-Khinchin theorem: $S_{xx}(e^{j\omega}) = \text{DTFT}\{R_{xx}[m]\}$
Understand the spectral characteristics of white noise and colored noise

One-Sentence Summary

Random processes combine "randomness" with "time," and the Wiener-Khinchin theorem tells us: the Fourier transform of the autocorrelation function is the Power Spectral Density (PSD)—this bridge lets us analyse random signals with frequency-domain tools.

Why does this matter? Because there is no such thing as a "clean" signal in the real world. Every sensor reading contains noise, interference, and channel fading. The DFT, STFT and wavelets we have learned so far all assume the signal is deterministic—but for random signals you need statistical tools to describe the "average behaviour." This is the necessary foundation before Wiener filtering, Kalman filtering and adaptive filtering.

Previously: 5.7 Synchrosqueezing showed us fine time-frequency structure, but only for deterministic signals. Now we enter random signal analysis—we no longer ask "what does this signal look like?" but instead "what are the statistical properties of signals like these?" That requires a new mathematical framework.

The Problem: deterministic analysis is not enough

Suppose you are measuring vibration from an engine. Every time you start the engine the waveform is different (due to random factors such as ambient noise and initial conditions), yet the "statistical properties" (average power, spectral shape) are stable:

Communications: received signal = original signal + channel noise. You cannot predict each noise sample, but you can describe its statistics (Gaussian white noise, power $\sigma^2$).
Radar: the target echo is buried in clutter and thermal noise. Detection theory requires the noise power spectral density.
Finance: stock prices are unpredictable (random walk) but volatility has statistical regularities.
Biomedical: the alpha-band power of EEG is a statistic averaged over many measurements; a single measurement is unreliable.

Core question: the FFT spectrum of a random signal is "different every time" and does not converge to a fixed function. We need a way to define the frequency-domain properties of random signals through statistical averages.

Origin

Norbert Wiener (1930) proposed in his pioneering work Generalized Harmonic Analysis that for signals with finite power but infinite energy (such as random signals), the classical Fourier transform does not exist, but the Fourier transform of the autocorrelation function (the power spectral density) does make sense.

Alexander Khinchin (1934) independently proved the same result from a probability-theoretic perspective: for a stationary random process, the autocorrelation function and the power spectral density form a Fourier transform pair. This is the Wiener-Khinchin theorem—the central bridge between time-domain statistics (correlation functions) and frequency-domain representation (PSD).

Wiener went further and used PSD to derive the optimal linear filter (Wiener filter, see 8B), ushering in the era of statistical signal processing.

Core Concepts

Intuition: imagine a factory with 100 identical production lines, each with a vibration sensor recording data simultaneously. The waveform on each line is different (because of randomness), but their "average waveform" and "average power" are stable. A random process packages these 100 "possible waveforms" (each called a realization or sample function) into a single mathematical object.

Random process definition

A random process $\{x[n]\}$ is a collection of random variables indexed by time $n$. For each $n$, $x[n]$ is a random variable; for each "experiment" (realization), $x[n]$ is a deterministic time series (sample function).

Ensemble average vs. time average

Averaging method	Definition	When it applies
Ensemble average	$E[x[n]] = \int x \cdot f_X(x;n)\, dx$ Fix $n$, average over all realizations	Theoretical analysis (requires multiple experiments)
Time average	$\langle x[n] \rangle = \lim_{N\to\infty}\frac{1}{2N+1}\sum_{n=-N}^{N} x[n]$ Fix one realization, average over all time	Practice (usually only a single measurement)

Ergodicity: if the random process is ergodic, ensemble average = time average. This is an extremely important assumption in practice—because we usually only have one recording (a single realization) and must use it to estimate statistics.

Stationarity

Strict-Sense Stationary (SSS): all finite-dimensional joint PDFs are invariant under time shifts:

$$f_{x[n_1],\ldots,x[n_k]}(a_1,\ldots,a_k) = f_{x[n_1+m],\ldots,x[n_k+m]}(a_1,\ldots,a_k) \quad \forall m, k$$

This condition is too strong—verifying "all" joint PDFs is practically impossible.

Wide-Sense Stationary (WSS): only requires the first- and second-order statistics to be time-invariant:

Two WSS conditions

$$\text{(1)}\quad E[x[n]] = \mu_x \quad \text{(mean is constant, does not depend on $n$)}$$ $$\text{(2)}\quad R_{xx}[n, n-m] = R_{xx}[m] \quad \text{(autocorrelation depends only on lag $m$, not on absolute time $n$)}$$

WSS is the standard engineering assumption. Vibrations from most steady-state machines, stationary communication channels, and long-time environmental noise can all reasonably be treated as WSS.

Properties of the autocorrelation function

$$R_{xx}[m] = E\big[x[n]\, x^*[n-m]\big]$$

$R_{xx}[0] = E[|x[n]|^2]$ = average power
$R_{xx}[0] \geq |R_{xx}[m]|$ for all $m$ (maximum at the origin)
$R_{xx}[-m] = R_{xx}^*[m]$ (conjugate symmetry)
$R_{xx}[m]$ is a positive semi-definite function (guaranteeing a non-negative PSD)

Cross-correlation function:

$$R_{xy}[m] = E\big[x[n]\, y^*[n-m]\big]$$

The Wiener-Khinchin theorem

This is the most important result of the chapter—the bridge between time-domain statistics and frequency-domain representation:

Wiener-Khinchin Theorem

$$S_{xx}(e^{j\omega}) = \text{DTFT}\{R_{xx}[m]\} = \sum_{m=-\infty}^{\infty} R_{xx}[m]\, e^{-j\omega m}$$

$S_{xx}(e^{j\omega})$ = Power Spectral Density (PSD)

Key properties of the PSD:

$S_{xx}(e^{j\omega}) \geq 0$ (always non-negative and real)—it represents the "frequency distribution of power."
$\frac{1}{2\pi}\int_{-\pi}^{\pi} S_{xx}(e^{j\omega})\, d\omega = R_{xx}[0]$ = average power
$S_{xx}(e^{j\omega}) = S_{xx}(e^{-j\omega})$ (for real signals the PSD is an even function)

Show full derivation: sketch of the Wiener-Khinchin theorem

For finite-power random signals the DTFT does not converge directly (infinite energy). Use the truncated version instead:

$$X_N(e^{j\omega}) = \sum_{n=0}^{N-1} x[n]\, e^{-j\omega n}$$

Define the periodogram:

$$I_N(\omega) = \frac{1}{N}|X_N(e^{j\omega})|^2$$

Take the expectation:

$$E[I_N(\omega)] = \frac{1}{N}\sum_{n=0}^{N-1}\sum_{k=0}^{N-1} E[x[n]x^*[k]]\, e^{-j\omega(n-k)}$$

Let $m = n - k$ and use the WSS condition $E[x[n]x^*[k]] = R_{xx}[n-k]$:

$$= \sum_{m=-(N-1)}^{N-1}\left(1 - \frac{|m|}{N}\right) R_{xx}[m]\, e^{-j\omega m}$$

As $N \to \infty$:

$$\lim_{N\to\infty} E[I_N(\omega)] = \sum_{m=-\infty}^{\infty} R_{xx}[m]\, e^{-j\omega m} = S_{xx}(e^{j\omega}) \quad \blacksquare$$

Note: this shows that the expectation of the periodogram converges to the PSD, but the variance of a single periodogram does not decrease with $N$ (which is why PSD estimation requires Welch's method or multitapers—recall section 3.2).

White noise

$$R_{ww}[m] = \sigma^2\, \delta[m] \quad \Longleftrightarrow \quad S_{ww}(e^{j\omega}) = \sigma^2 \quad \text{(flat spectrum)}$$

The autocorrelation is nonzero only at $m = 0$, so samples at different times are completely uncorrelated. Its DTFT is a constant, meaning the power is the same at all frequencies (the origin of the term "white", by analogy with white light containing all colours).

Coloured noise: white noise after filtering

If white noise $w[n]$ passes through an LTI system $H(z)$ producing output $y[n]$:

$$S_{yy}(e^{j\omega}) = |H(e^{j\omega})|^2 \cdot S_{ww}(e^{j\omega}) = \sigma^2 |H(e^{j\omega})|^2$$

The "flat" white-noise spectrum is shaped by $|H|^2$, producing "coloured" noise with a specific spectral shape. For instance, the AR(1) filter $H(z) = 1/(1 - az^{-1})$ produces low-frequency-dominated pink noise.

How to Use: numerical example

Step 1: define the random process

Consider the AR(1) process $x[n] = 0.9\, x[n-1] + w[n]$, where $w[n]$ is white noise with $\sigma^2 = 1$.

Step 2: compute the theoretical autocorrelation

The AR(1) autocorrelation has a closed form:

$$R_{xx}[m] = \frac{\sigma^2}{1 - a^2}\, a^{|m|} = \frac{1}{1 - 0.81} \cdot 0.9^{|m|} \approx 5.26 \cdot 0.9^{|m|}$$

Step 3: compute the theoretical PSD

$H(z) = 1/(1 - 0.9z^{-1})$, so:

$$S_{xx}(e^{j\omega}) = \frac{\sigma^2}{|1 - 0.9\, e^{-j\omega}|^2} = \frac{1}{1.81 - 1.8\cos\omega}$$

At low frequency $S_{xx}(1) = 1/(1.81 - 1.8) = 100$ (= 20 dB); at high frequency $S_{xx}(-1) = 1/(1.81 + 1.8) \approx 0.277$ (= -5.6 dB). This is a low-pass type PSD.

Step 4: estimate from real data

Generate $N = 1024$ points of AR(1) data, estimate the PSD with Welch's method and compare to the theoretical value. The more averages, the more accurate the estimate.

Practical key point: when estimating PSDs you always trade off frequency resolution against estimator variance. Longer segments give better frequency resolution but fewer segments to average, making the estimate less stable (recall Welch's method in 3.2).

Interactive: estimating statistics of random processes

Observe how, as the number of realizations increases, the estimated autocorrelation and PSD gradually approach the theoretical values.

Noise type:

Number of realizations: 50

Applications

Communication system design: channel noise is modelled as a WSS process, and its PSD determines receiver sensitivity and the choice of modulation. The AWGN (Additive White Gaussian Noise) channel is the most basic model.
Radar signal processing: the PSD shape of clutter determines the design of the MTI (Moving Target Indication) filter. Gaussian-shaped clutter PSD requires multi-pulse cancellers.
Vibration monitoring: vibration of a machine running in steady state can be treated as a WSS process. Peak frequencies in the PSD correspond to rotation speed and fault characteristic frequencies.
Financial time series: the autocorrelation structure of stock-return series (e.g. GARCH models) drives volatility forecasting and risk management strategies.
Speech coding: Linear Predictive Coding (LPC) models speech as an AR process; the Levinson-Durbin recursive solution of the autocorrelation matrix is the core algorithm.

Pitfalls and Limitations

The WSS assumption does not always hold: non-stationary processes (e.g. engine acceleration, speech transitions) violate WSS. You then need short-time analysis (assuming approximate WSS over short windows) or a time-varying model.
Ergodicity cannot be assumed: not every WSS process is ergodic. For example $x[n] = A\cos(\omega_0 n)$ (with $A$ a random variable) is WSS but not ergodic—the time average is always zero, the ensemble average is also zero, but the time-based and ensemble-based autocorrelation estimates may differ.
Estimator bias from finite data: autocorrelation estimates at large lags $|m|$ are unreliable (only $N - |m|$ products are available for averaging). Rule of thumb: only trust estimates with $|m| < N/10$.
The periodogram is a poor PSD estimator: the variance of a single periodogram does not decrease with $N$ (inconsistent estimator). You must use Welch's method, multitapers, or parametric methods (see sections 3.1–3.3).
Cross-correlation is not causation: $R_{xy}[m] \neq 0$ only indicates linear statistical correlation, not that $x$ causes $y$.

Quick Check

Q1: White noise has autocorrelation $R_{ww}[m] = \sigma^2 \delta[m]$. What is its PSD and why is it called "white" noise?

Show answer

$S_{ww}(e^{j\omega}) = \sigma^2$, a constant—the power is equal at every frequency. Just as white light contains every visible frequency, this is why it is called "white" noise. Mathematically, the DTFT of $\delta[m]$ is the constant 1, multiplied by $\sigma^2$.

Q2: A WSS process has a PSD with a large peak at $\omega = 0$ and is near zero around $\omega = \pi$. What does this tell you? How would you describe it in filter language?

Show answer

This is a low-frequency-dominated process (e.g. 1/f noise or an AR(1) process with positive correlation). It is equivalent to the output of a low-pass filter $H(z)$ driven by white noise: $S_{xx} = |H|^2 \sigma^2$. Adjacent samples are positively correlated ($R_{xx}[1] > 0$) and the signal varies smoothly.

References: [1] Wiener, N., Generalized Harmonic Analysis, Acta Math., 55:117-258, 1930. [2] Khinchin, A., Korrelationstheorie der stationaren stochastischen Prozesse, Math. Ann., 109:604-615, 1934. [3] Oppenheim & Schafer, Discrete-Time Signal Processing, 3rd ed., Ch.11. [4] Haykin, S., Adaptive Filter Theory, 5th ed., Ch.2. [5] Papoulis, A. & Pillai, S.U., Probability, Random Variables and Stochastic Processes, 4th ed., Ch.9-12.

← 5.7 Synchrosqueezing 8B Wiener Filter →

8B Wiener Filter

Optimal linear estimation when statistical properties are known — minimum mean square error filtering

Learning Objectives

Understand the Wiener filter problem formulation: optimally estimating a target signal from noisy observations
Derive the Wiener-Hopf equation: $\mathbf{R}_{xx}\, \mathbf{h}_{\text{opt}} = \mathbf{r}_{xd}$
Understand the frequency-domain Wiener filter: $H_{\text{opt}}(e^{j\omega}) = S_{xd}(e^{j\omega}) / S_{xx}(e^{j\omega})$
Analyze the special case of signal plus uncorrelated noise and its intuitive meaning
Understand the relationship between Wiener filtering and M4E Adaptive Filtering (LMS)

One-Sentence Summary

The Wiener filter is the optimal linear filter when the signal and noise statistics are known—it automatically decides whether to "pass or suppress" at each frequency based on the local SNR, letting the signal through in high-SNR bands and attenuating the noise in low-SNR bands.

Why does this matter? The Wiener filter is the starting point of all optimal linear filtering theory. The LMS adaptive filter aims to approximate the Wiener solution; the Kalman filter is its non-stationary generalisation; the classical approaches to speech enhancement, image denoising, and communication equalisation all come from this framework. Understanding Wiener filtering is understanding the cornerstone of statistical signal processing.

Previously: 8A built up the WSS random-process framework—autocorrelation $R_{xx}[m]$, power spectral density $S_{xx}(e^{j\omega})$, and the Wiener-Khinchin theorem. Now we use these tools to answer a central question: how do we "best" recover the original signal from noisy observations?

The Problem: simple filters are not smart enough

You receive a noisy signal $x[n] = d[n] + v[n]$ ($d[n]$ is the desired signal, $v[n]$ is noise). Intuitively you would use a low-pass filter to denoise, but:

How do you choose the cutoff frequency? If the signal and noise bands overlap (which almost always happens), any fixed cutoff will either damage the signal or let noise through.
Speech denoising: the speech band (100 Hz – 4 kHz) overlaps white noise completely. A low-pass filter will also cut the high-frequency consonants (/s/, /f/).
The idea: the ideal filter should be frequency-dependent—pass through in bands where the signal is strong, fully suppress in bands where noise dominates, and strike the best compromise in mixed bands.

Core question: given the signal and noise statistics (PSDs), can we systematically derive the "best" filter so that the power of the estimation error is minimised?

Origin

Norbert Wiener (1942/1949): during WWII Wiener developed optimal linear prediction and filtering theory in order to predict the flight trajectories of enemy aircraft for anti-aircraft gun aiming. His classified report was published after the war under the title Extrapolation, Interpolation, and Smoothing of Stationary Time Series (nicknamed "The Yellow Peril" because of its yellow cover—and because it was so hard to read).

Andrey Kolmogorov (1941): in the Soviet Union Kolmogorov independently derived essentially the same result for the prediction of stationary time series. For this reason it is sometimes called Kolmogorov-Wiener filtering.

The Wiener-Hopf equation is named after a class of integral equations that Wiener and Eberhard Hopf studied together in 1931. The Wiener filtering problem happens to reduce to this class of equations.

Core Concepts

Intuition: imagine listening to a friend in a noisy restaurant. What is your brain doing? In the bands where your friend's voice is clear (e.g. certain vowel formants) you listen almost completely; in the bands dominated by background noise (e.g. low-frequency rumble) you automatically ignore. The Wiener filter does exactly this—but in a mathematically optimal way.

Problem setup

Observation: $x[n]$ (noisy signal)
Target: $d[n]$ (signal to be estimated)
Filter output: $\hat{d}[n] = \sum_{k} h[k]\, x[n-k] = h * x$
Error: $e[n] = d[n] - \hat{d}[n]$
Goal: minimise the mean-square error $J = E[|e[n]|^2]$

Derivation: the Wiener-Hopf equation

Take the gradient of $J$ and set it to zero:

Optimality condition (orthogonality principle)

$$\frac{\partial E[|e[n]|^2]}{\partial h^*[k]} = 0 \quad \forall k$$ $$\Longrightarrow \quad E[e[n]\, x^*[n-k]] = 0 \quad \forall k$$

The optimal error is orthogonal to all observations (Orthogonality Principle)

Show full derivation: from orthogonality principle to Wiener-Hopf equation

Expand $e[n] = d[n] - \hat{d}[n] = d[n] - \sum_l h[l] x[n-l]$ and substitute into the orthogonality condition:

$$E\Big[\Big(d[n] - \sum_l h[l] x[n-l]\Big) x^*[n-k]\Big] = 0$$ $$E[d[n] x^*[n-k]] = \sum_l h[l]\, E[x[n-l] x^*[n-k]]$$

Using the WSS definition, $E[x[n-l] x^*[n-k]] = R_{xx}[k-l]$ and $E[d[n] x^*[n-k]] = R_{dx}[k]$:

$$R_{dx}[k] = \sum_l h_{\text{opt}}[l]\, R_{xx}[k-l] = (h_{\text{opt}} * R_{xx})[k]$$

This is the convolutional form of the Wiener-Hopf equation.

Matrix form (FIR filter, order $M$):

$$\underbrace{\begin{bmatrix} R_{xx}[0] & R_{xx}[1] & \cdots & R_{xx}[M-1] \\ R_{xx}[1] & R_{xx}[0] & \cdots & R_{xx}[M-2] \\ \vdots & & \ddots & \vdots \\ R_{xx}[M-1] & & \cdots & R_{xx}[0] \end{bmatrix}}_{\mathbf{R}_{xx}\ (\text{Toeplitz})} \underbrace{\begin{bmatrix} h[0] \\ h[1] \\ \vdots \\ h[M-1] \end{bmatrix}}_{\mathbf{h}_{\text{opt}}} = \underbrace{\begin{bmatrix} R_{dx}[0] \\ R_{dx}[1] \\ \vdots \\ R_{dx}[M-1] \end{bmatrix}}_{\mathbf{r}_{dx}}$$

$\mathbf{R}_{xx}$ is a Toeplitz matrix, so it can be solved with the Levinson-Durbin algorithm in $O(M^2)$ (instead of $O(M^3)$ for general matrix inversion).

Frequency-domain form: taking the DTFT of the convolution equation:

$$S_{dx}(e^{j\omega}) = H_{\text{opt}}(e^{j\omega}) \cdot S_{xx}(e^{j\omega})$$ $$\boxed{H_{\text{opt}}(e^{j\omega}) = \frac{S_{dx}(e^{j\omega})}{S_{xx}(e^{j\omega})}} \quad \blacksquare$$

Wiener-Hopf equation

$$\text{Matrix form:}\quad \mathbf{R}_{xx}\, \mathbf{h}_{\text{opt}} = \mathbf{r}_{dx}$$ $$\text{Frequency-domain form:}\quad H_{\text{opt}}(e^{j\omega}) = \frac{S_{xd}(e^{j\omega})}{S_{xx}(e^{j\omega})}$$

Special case: signal + uncorrelated noise

If $x[n] = d[n] + v[n]$ where $d$ and $v$ are uncorrelated ($R_{dv}[m] = 0$), then:

$S_{xx} = S_{dd} + S_{vv}$ (powers add)
$S_{xd} = S_{dd}$ (cross-PSD equals the signal PSD)

Wiener denoising formula

$$H_{\text{opt}}(e^{j\omega}) = \frac{S_{dd}(e^{j\omega})}{S_{dd}(e^{j\omega}) + S_{vv}(e^{j\omega})}$$

Intuitive interpretation:

High-SNR bands ($S_{dd} \gg S_{vv}$): $H \approx S_{dd}/S_{dd} = 1$ → pass through
Low-SNR bands ($S_{vv} \gg S_{dd}$): $H \approx S_{dd}/S_{vv} \approx 0$ → strongly suppressed
In between: $H$ transitions smoothly with local SNR → frequency-dependent optimal trade-off

Essence: the Wiener filter is a frequency-dependent SNR gate—it independently asks at each frequency "is there more signal or more noise here?" and then applies the optimal gain. This is why it is far better than a low-pass filter with a fixed cutoff.

Minimum Mean-Square Error (MMSE)

$$J_{\min} = \frac{1}{2\pi}\int_{-\pi}^{\pi} \frac{S_{dd}(e^{j\omega})\, S_{vv}(e^{j\omega})}{S_{dd}(e^{j\omega}) + S_{vv}(e^{j\omega})}\, d\omega$$

In bands with zero SNR, the residual error is $S_{dd}/2$ (the best you can do is cut it in half); in bands with very high SNR, the residual error approaches zero.

Relationship to LMS adaptive filtering

Property	Wiener filter	LMS adaptive filter
Required information	$R_{xx}$ and $r_{xd}$ (offline statistics)	Only the input $x[n]$ and reference $d[n]$
Solution method	Solve the Wiener-Hopf equation (one-shot)	Stochastic approximation via gradient descent (per-sample update)
Convergence	Instantaneous (non-recursive)	Takes time to converge, controlled by step size $\mu$
Tracking ability	None (assumes WSS)	Yes (can track slowly time-varying environments)
Relationship	After convergence, LMS oscillates around the Wiener solution

How to Use: numerical example for speech denoising

Step 1: problem setup

Clean speech $d[n]$ (simulated by a 100 Hz sinusoid) + white noise $v[n]$ ($\sigma^2 = 0.5$), input SNR = 0 dB.

Step 2: estimate the PSDs

Use Welch's method to estimate $S_{dd}$ (from the clean signal) and $S_{vv}$ (from a noise-only segment). In practice the clean signal is not available, so VAD (Voice Activity Detection) is used to estimate $S_{vv}$ during silence.

Step 3: compute the Wiener filter

$$H_{\text{opt}}[k] = \frac{\hat{S}_{dd}[k]}{\hat{S}_{dd}[k] + \hat{S}_{vv}[k]}$$

where $k$ is the frequency-bin index. Near 100 Hz (strong signal) $H \approx 1$; far from 100 Hz (noise-dominated) $H \approx 0$.

Step 4: apply in the frequency domain

$\hat{D}[k] = H_{\text{opt}}[k] \cdot X[k]$, then IFFT back to the time domain.

Step 5: evaluation

Compare SNR before and after filtering: SNR improves from 0 dB to 10–15 dB (depending on how much the signal and noise spectra overlap).

# Python: Frequency-domain Wiener filter (speech denoising) import numpy as np from scipy.fft import fft, ifft def wiener_filter(noisy, clean_psd, noise_psd): """ noisy: observed signal (with noise) clean_psd: estimated clean signal PSD noise_psd: estimated noise PSD """ Y = fft(noisy) H = clean_psd / (clean_psd + noise_psd + 1e-12) X_est = Y * H return np.real(ifft(X_est))

Interactive: Wiener denoising

Watch how the Wiener filter automatically adjusts the gain at each frequency based on SNR. Drag the SNR slider to see how the filter shape and output quality change.

Input SNR: 0 dB

Output SNR: ---

Applications

Speech Enhancement: noise cancellation in mobile phone calls. Speech PSD is concentrated in the formant regions between 100 Hz and 4 kHz, while noise PSD is relatively flat. The Wiener filter lets speech through in the formant bands and suppresses noise in the other bands. Modern mobile-phone noise-cancellation chips are essentially improved Wiener filters.
Astronomical image denoising: images from the Hubble Space Telescope suffer from photon noise and readout noise. Wiener deconvolution handles denoising and deblurring simultaneously.
Communication equalizers: the MMSE equalizer is a direct application of the Wiener filter—it minimizes the mean-square error of ISI (intersymbol interference) + noise.
EEG/ERP artifact removal: removing eye-movement, EMG and similar artifacts. Signal and artifact PSDs differ, so the Wiener filter can selectively suppress the artifact bands.
Active Noise Control (ANC): feedforward/feedback path design in noise-cancelling headphones is essentially solving a causal Wiener filtering problem.

Pitfalls and Limitations

Requires known PSDs: the Wiener filter needs $S_{dd}$ and $S_{vv}$, but in practice the clean signal is unavailable. Common approach: estimate $S_{vv}$ during silent segments and use $S_{dd} = S_{xx} - S_{vv}$ (which can become negative, requiring half-wave rectification or a spectral floor).
Assumes WSS: speech, music and similar signals are highly non-stationary. Remedy: use short-time Wiener filtering (re-estimate the PSDs and recompute $H$ on each frame) combined with the STFT framework.
Musical noise: spectral subtraction (a simplified Wiener filter) produces randomly appearing residual spectral peaks that sound like "whistles" or "water drops". Remedies: smooth the time trajectory of $H$ and set a spectral floor.
Non-causal: the theoretical Wiener filter is non-causal (uses future samples). The causal version requires spectral factorization and is more complex.
Only linearly optimal: if the noise is non-Gaussian, a nonlinear estimator may do better. Wiener is optimal among all linear estimators, not all estimators.

Quick Check

Q1: In the Wiener denoising formula $H = S_{dd}/(S_{dd}+S_{vv})$, if the SNR at some frequency is 0 dB (i.e. $S_{dd} = S_{vv}$), what is the value of $H$ there? Does it make intuitive sense?

Show answer

$H = S_{dd}/(S_{dd}+S_{dd}) = 0.5$. The Wiener filter sets the gain to 0.5 (6 dB attenuation). This is intuitive: when signal and noise have equal power they cannot be fully separated, and the optimal strategy is a compromise—pass half the energy, losing some signal but also suppressing half the noise.

Q2: Why is the LMS adaptive filter called an "online approximation" of the Wiener filter? What are its advantages?

Show answer

The Wiener filter needs to know $R_{xx}$ and $r_{xd}$ (statistics) in advance and is an offline one-shot solution. LMS does not need these statistics—it uses the current input and error to estimate the gradient direction and updates the filter coefficients sample by sample. After convergence, the LMS coefficients oscillate around the optimal Wiener solution. LMS advantages: (1) no prior statistics required, (2) it can track slowly time-varying environments.

References: [1] Wiener, N., Extrapolation, Interpolation, and Smoothing of Stationary Time Series, MIT Press, 1949. [2] Kolmogorov, A.N., Interpolation and Extrapolation of Stationary Random Sequences, Izv. Akad. Nauk SSSR, 5:3-14, 1941. [3] Haykin, S., Adaptive Filter Theory, 5th ed., Ch.3. [4] Oppenheim & Schafer, Discrete-Time Signal Processing, 3rd ed., Ch.11. [5] Boll, S., Suppression of Acoustic Noise in Speech Using Spectral Subtraction, IEEE Trans. ASSP, 27(2):113-120, 1979. [6] Loizou, P.C., Speech Enhancement: Theory and Practice, 2nd ed., CRC Press, 2013.

← 8A Random Processes WSS 9A OLA/OLS →

Interactive Lab

Signal Generator + FFT Spectrum Analyzer

Signal Generator

Component 1 Freq: 50Hz

Component 1 Amp: 1.00

Component 2 Freq: 120Hz

Component 2 Amp: 0.50

Noise Level: 0.10

Window Function:

Time-Domain Waveform

FFT Magnitude Spectrum

Technical Details:

Sampling rate: 1024 Hz
Number of samples: 1024 points
Frequency resolution: Δf = f_s/N = 1024/1024 = 1 Hz
Maximum analyzable frequency: f_s/2 = 512 Hz (Nyquist)
Signal model: x[n] = A₁sin(2πf₁n/f_s) + A₂sin(2πf₂n/f_s) + noise

🧪 Guided Experiments

Experiment 1: Spectrum of a Single Sine Wave

Settings: f1=100Hz, A1=1.0, turn off f2 (A2=0), noise=0, window=Hann

Expected result: The time domain shows a clean sine wave. The spectrum has a sharp peak at 100Hz with almost no leakage to adjacent bins.

Try this: Switch to the Rectangular window and observe the sidelobes (leakage) appearing around 100Hz.

Experiment 2: Two Closely Spaced Frequencies

Settings: f1=100Hz, f2=108Hz, A1=A2=1.0, noise=0

Expected result: With the Hann window, you should see two separate peaks. Switch to the Rectangular window, and the two peaks may merge (because the wider main lobe causes mutual interference).

Try this: Move f2 closer to 104Hz and observe when the two peaks become indistinguishable.

Experiment 3: Weak Signal in Noise

Settings: f1=100Hz A1=1.0, f2=200Hz A2=0.1, noise=0.5

Expected result: The 100Hz peak is clearly visible, but the 200Hz peak may be buried in noise. Switch to the Blackman window (low sidelobes), and the 200Hz peak should become easier to identify.

Experiment 4: Harmonic Structure of a Square Wave

Settings: (requires changing the waveform to square in the code) f1=50Hz square wave, A1=1.0

Expected result: The spectrum shows peaks at 50, 150, 250, 350... Hz (odd harmonics), with amplitudes decaying as 1/n. This is a practical verification of the Fourier series.

About This Platform

This educational platform is a comprehensive, graduate-level online resource on Fourier Analysis, covering six major parts from mathematical foundations to engineering practice.

Course Structure

Part	Topic	Sections
Part I	Mathematical Foundations	4 sections
Part II	Four Core Fourier Transforms + Z-Transform	6 sections
Part III	Spectral Estimation	4 sections
Part IV	Analytic Signals & Cepstrum	3 sections
Part V	Time-Frequency Analysis	5 sections
Part VI	Engineering Practice	10 sections

Design Philosophy

Intuition First: Each concept is first explained in plain language ("why"), then formalized with equations, and finally accompanied by rigorous derivations in expandable <details> blocks.
Problem-Driven: Starting from real-world questions -- "Why does the FFT on an FPGA lack precision?" "Why does the OFDM CP need to be so long?" -- rather than from abstract definitions.
Engineering Connection: Every theoretical concept is paired with concrete industrial application examples and real numerical parameters.
Interactive Exploration: Built-in interactive charts and a lab environment allow learners to adjust parameters and observe results firsthand.

Target Audience

Graduate students in Electrical Engineering / Electronics / Communications / Computer Science
Signal Processing / DSP Engineers
Vibration Analysis / Predictive Maintenance Engineers
Biomedical Engineering / Neuroscience Researchers
Radar / Communication System Design Engineers

Technical Information

Language: English. Technical terms include original terminology where appropriate.
Math Typesetting: Mathematical formulas are marked with the CSS class fm, supporting MathJax/KaTeX rendering.
Interactive Charts: Div containers with the plot class, rendered by JavaScript charting libraries (e.g., Plotly.js, Chart.js).

Key References

Oppenheim, A.V. & Schafer, R.W. Discrete-Time Signal Processing, 3rd Ed., Pearson, 2010.
Haykin, S. & Van Veen, B. Signals and Systems, 2nd Ed., Wiley, 2003.
Proakis, J.G. & Manolakis, D.G. Digital Signal Processing, 4th Ed., Pearson, 2007.
Randall, R.B. Vibration-based Condition Monitoring, Wiley, 2011.
Goldsmith, A. Wireless Communications, Cambridge University Press, 2005.
Richards, M.A. Fundamentals of Radar Signal Processing, 2nd Ed., McGraw-Hill, 2014.
Van Trees, H.L. Optimum Array Processing, Wiley, 2002.
Mallat, S. A Wavelet Tour of Signal Processing, 3rd Ed., Academic Press, 2009.
Task Force of ESC/NASPE. "Heart rate variability: Standards of measurement," Circulation, 93(5), 1996.
3GPP TS 38.211, "NR; Physical channels and modulation," Release 17.

📓 Python Examples & Exercises

All Python code on this platform can be copied directly into your Jupyter Notebook and run. Recommended environment setup:

# Install required packages pip install numpy scipy matplotlib jupyter # Advanced packages (optional) pip install librosa # audio processing pip install wfdb # PhysioNet data loading pip install soundfile # audio file I/O pip install pyfftw # faster FFT

Recommended study workflow:

Read through each chapter's theory and examples
Copy the Python code into Jupyter, run it, and observe the output
Modify parameters and watch how the behavior changes (try extreme values!)
Download data from the Real-World Datasets on the home page to replace the synthetic signals
Record your observations and open questions