STFT for PPG: Short-Time Fourier Tracking for Heart Rate Under Motion
Practical STFT guide for PPG: choose windows, track spectral peaks, manage motion tradeoffs, and build robust heart-rate estimators for wearable monitoring.

The Short-Time Fourier Transform (STFT) is a straightforward, interpretable method to follow the cardiac frequency in a moving spectrogram. With careful window selection, overlap, and a constrained spectral tracker, STFT-based tracking finds the dominant cardiac peak even when motion adds competing spectral energy.
Quick answer
Use a Hann window of 2 to 8 seconds with 75 to 90 percent overlap, compute the spectrogram, and follow the dominant peak with dynamic programming or a Kalman tracker constrained by physiological heart-rate dynamics. Longer windows give better frequency resolution for separating motion harmonics while shorter windows respond faster to sudden heart-rate changes.
Why STFT works for PPG
PPG heart rate manifests as a narrowband oscillation in the 0.6 to 4 Hz range. The STFT divides the signal into short segments and computes the FFT for each segment, producing a time-frequency map. Motion often occupies other frequencies or creates harmonics; with appropriate window length and spectral smoothing STFT can separate cardiac peaks from artifact peaks.
See our PPG power spectral analysis guide for baseline STFT parameters, motion artifact detection for gating, and our PPG preprocessing pipeline guide for upstream cleanup.
Time-frequency tradeoffs
- Frequency resolution Δf = 1 / window_length. Longer windows improve frequency separation but reduce temporal responsiveness.
- Time resolution is approximately the window length. With overlapping windows and interpolation you can update estimates faster than the full window length.
- Window type affects sidelobes and spectral leakage. Hann and Hamming balance main-lobe width and sidelobe suppression. Kaiser windows provide tunable sidelobe control.
Guideline: when motion produces a strong spectral component within Δf of the cardiac peak, increase window length or use multi-taper estimation to reduce variance.
Spectral peak tracking strategies
1) Argmax per frame with continuity constraint
Find the highest peak in each STFT column within the cardiac band. To avoid swapping to motion peaks, apply a continuity constraint such as limiting maximum allowed heart rate change between frames (for example 10 BPM per second). This can be solved with dynamic programming or a Viterbi-like path search.
2) Peak ridge extraction
Treat the spectrogram as an image and extract ridge lines corresponding to local maxima across time. Ridge tracking finds smoothly varying frequency curves and is robust to brief drops in peak prominence.
3) Probabilistic filtering
Use a Kalman filter or particle filter that models heart rate dynamics and observes the spectrogram energy as noisy observations of the latent heart-rate frequency. This is effective when peaks are broad or when SNR is low.
4) Multi-taper and multi-resolution
Multi-taper STFT reduces spectral variance and helps separate nearby peaks. For cases with close cardiac and motion frequencies, consider multi-taper with NW=3 and 3 to 5 tapers.
Practical tips
- Use high overlap (75% to 90%) to get frequent updates and smoother spectrograms. With a 4-second window and 87.5% overlap you can update every 0.5 seconds.
- Zero-pad to increase interpolation in the spectral domain which helps refine peak frequency between discrete bins.
- Median-filter the frequency track across frames to remove frame-level jitter but keep the median length short to preserve dynamic changes.
- Apply a bandpass before STFT to reduce broadband noise and speed up computation.
Edge cases and failure modes
- When heart rate and step frequency are close, STFT may confuse the two. Use accelerometer-informed masks and continuity constraints.
- For arrhythmias or non-sinus rhythms, the spectral peak can split or broaden. In those cases combine STFT tracking with beat-level detectors or switch to time-domain detection.
- Low perfusion reduces SNR; use longer windows and multi-taper to stabilize spectral estimates.
Integrating STFT with beat-domain methods
STFT gives a continuous frequency estimate. Convert instantaneous frequency to expected RR intervals and use that as a prior for peak-searchers on the raw waveform. This hybrid approach improves resilience: STFT tracks slow trends while beat detectors lock to individual pulses.
Viterbi tracking example
A Viterbi path search treats each time frame as a state and spectral bins as observations. The transition cost penalizes large frequency jumps. The algorithm finds the most likely smooth frequency path across the spectrogram and is robust to brief SNR dropouts.
Pseudocode:
- Build a score matrix S where S[i,t] = -log(energy at bin i at time t).
- Define transition cost T[i,j] = alpha * |bin_i - bin_j|.
- Use dynamic programming to compute minimal path cost across frames and backtrack.
This approach is simple to implement and often outperforms greedy argmax trackers when peaks are noisy.
Expected performance and benchmarks
- Rest: HR MAE < 1.5 BPM is typical with carefully tuned parameters.
- Walking: HR MAE 1.5 to 4 BPM depending on step frequency overlap and device optics.
- Running: HR MAE can increase to 5 to 10 BPM on wrist devices unless ACC-informed cancellation is used.
Performance varies by device, sensor mounting, and sampling rate. Benchmarks should always include ECG as a ground truth.
Parameter reference table
- Window length: 2 to 8 seconds
- Overlap: 75% to 90%
- Window function: Hann or Hamming
- Cardiac band: 0.7 to 3.5 Hz
- Frequency interpolation: zero-pad to 2x or 4x
- Smoothing: median filter across 3 to 5 frames
Implementation recipe (Python)
import numpy as np
from scipy.signal import stft, hann
fs = 100
win_sec = 4.0
nperseg = int(win_sec * fs)
overlap = int(nperseg * 0.875) # 87.5% overlap
f, t, Z = stft(ppg_signal, fs=fs, window='hann', nperseg=nperseg, noverlap=overlap)
S = np.abs(Z)
cardiac_idx = np.where((f>=0.7) & (f<=3.5))[0]
S_card = S[cardiac_idx, :]
# simple tracker: pick max and enforce continuity
freq_track = np.zeros(S_card.shape[1])
prev_bin = np.argmax(S_card[:, 0])
for k in range(S_card.shape[1]):
candidates = S_card[:, k]
# penalize distance from prev_bin
penalized = candidates - 0.5 * np.abs(np.arange(len(candidates)) - prev_bin)
best = np.argmax(penalized)
freq_track[k] = f[cardiac_idx[best]]
prev_bin = best
hr_bpm = freq_track * 60
Tune penalty and frame hop to match your application.
Super-resolution options
When peaks are closer than Δf, use parametric methods such as MUSIC or ESPRIT. These methods assume a sum-of-sinusoids model and can resolve closely spaced tones, but they are sensitive to model mismatch and noise.
Comparison with other time-frequency methods
- CWT offers variable resolution and is better when low-frequency resolution matters, such as respiratory-related amplitude modulation. See our CWT guide for practical scalogram reading.
- Hilbert transform provides instantaneous frequency but requires narrowband pre-filtering and is sensitive to waveform harmonics.
- STFT is easier to implement, more interpretable, and often sufficient for heart-rate tracking if parameters are chosen carefully.
Implementation pitfalls and checks
- Always test with ECG ground truth across a range of heart rates and activities.
- Inspect spectrograms visually to catch systematic errors such as persistent switching to motion peaks.
- Verify runtime on target hardware and measure memory usage when using multi-taper or parallel STFT runs.
Validation and benchmarks
- Test across activities: rest, walking, running, cycling, and strength training.
- Use ECG as ground truth to compute HR MAE and offline beat alignment errors.
- Log switch events where tracker jumps to a motion peak and analyze frequency overlap conditions.
References
- Moço J, Stuijk S, de Haan G. Robust remote photoplethysmography in realistic scenarios. Biomedical Optics Express. 2019. https://doi.org/10.1364/BOE.10.003546
- Gillinov AM, Synan A, et al. Confounders in wearable heart rate measurement. Journal of the American College of Cardiology. 2017. https://doi.org/10.1016/j.jacc.2017.01.071
- Thomson DJ. Spectrum estimation and harmonic analysis. Proceedings of the IEEE. 1982.
Closing recommendations
For most wearable applications, start with a 4-second Hann window and 87.5% overlap, add a continuity-penalized argmax or Viterbi path, and use accelerometer masks to reduce switching. Iterate with labeled data and ECG to reach robust performance for your device.
FAQ
What window length is best for STFT on PPG? Choose 2 to 8 seconds based on the HR dynamics you must resolve. For stable tracking with motion, 4 to 8 seconds is common.
How do I avoid switching to motion peaks? Use continuity constraints, accelerometer masks, and spectral smoothing. Dynamic programming on the spectrogram often yields robust paths.
Can STFT run on-device? Yes. Use incremental FFT and overlap-add to keep CPU cost low. Multi-taper and super-resolution are more costly and usually reserved for cloud or offline analysis.
Frequently Asked Questions
- What window length is best for STFT on PPG?
- Choose 2 to 8 seconds based on the HR dynamics you must resolve. For stable tracking with motion, 4 to 8 seconds is common.
- How do I avoid switching to motion peaks?
- Use continuity constraints, accelerometer masks, and spectral smoothing. Dynamic programming on the spectrogram often yields robust paths.
- Can STFT run on-device?
- Yes. Use incremental FFT and overlap-add to keep CPU cost low. Multi-taper and super-resolution are more costly and usually reserved for cloud or offline analysis.