PPG Respiratory Rate Estimation: Extracting Breathing Rate from Photoplethysmography
Respiratory rate is the vital sign most sensitive to clinical deterioration, yet it is the least frequently and least accurately recorded in routine clinical practice. Early warning score systems for detecting patient deterioration (NEWS, MEWS) consistently identify respiratory rate as the strongest single predictor of adverse events including cardiac arrest, ICU admission, and death within 24 hours (Churpek et al., 2012; DOI: 10.1016/j.resuscitation.2012.02.009). The ability to extract continuous respiratory rate from PPG signals -- sensors already present in pulse oximeters on virtually every hospitalized patient and in billions of consumer wearables -- represents one of the most immediately clinically impactful applications of PPG technology.
This article provides a technical review of the three primary mechanisms by which respiration modulates the PPG waveform, the signal processing algorithms used to extract respiratory rate, fusion strategies for combining multiple respiratory estimates, and the accuracy benchmarks that define the current state of the art.
Physiological Mechanisms of Respiratory Modulation in PPG
Respiration influences the PPG waveform through three distinct physiological pathways, each producing a characteristic modulation pattern. Understanding these mechanisms is essential for designing robust extraction algorithms.
Respiratory-Induced Frequency Variation (RIFV)
Respiratory sinus arrhythmia (RSA) is the physiological variation in heart rate synchronized with breathing: heart rate increases during inspiration and decreases during expiration. This is mediated primarily by vagal modulation of the sinoatrial node and is modulated by lung stretch receptors, central respiratory drive, and baroreflex interactions.
RSA produces a frequency modulation of the PPG inter-beat interval (IBI) series. The instantaneous heart rate oscillates at the respiratory frequency, and this oscillation can be extracted by analyzing the IBI time series in the frequency domain. The respiratory frequency appears as a spectral peak in the 0.15-0.4 Hz band (corresponding to 9-24 breaths per minute) of the IBI power spectrum, which is the high-frequency (HF) component of heart rate variability.
RIFV is the most robust respiratory modulation in young, healthy individuals at rest, where RSA amplitude can reach 10-20 BPM peak-to-peak. However, RSA amplitude decreases markedly with age (Anrep et al., 1936; confirmed by numerous subsequent studies), at high heart rates, in patients with autonomic neuropathy (common in diabetes), and with certain medications (atropine, beta-blockers). In elderly or critically ill patients, RIFV may be too weak to detect reliably.
Respiratory-Induced Amplitude Variation (RIAV)
Changes in intrathoracic pressure during the respiratory cycle alter venous return to the heart and consequently stroke volume. During spontaneous (negative-pressure) inspiration, intrathoracic pressure decreases, increasing venous return to the right heart but initially decreasing left ventricular filling due to pulmonary vascular pooling. This produces beat-to-beat variation in stroke volume and arterial pulse pressure, which manifests as amplitude modulation of the PPG waveform.
The RIAV envelope is extracted by measuring the peak-to-peak amplitude (or systolic peak amplitude) of each PPG pulse and analyzing the resulting amplitude time series. The respiratory frequency appears as an oscillation in this envelope. Lázaro et al. (2014) demonstrated that RIAV provides complementary respiratory information to RIFV and is particularly valuable when RSA is attenuated (DOI: 10.1088/0967-3334/35/7/1407).
RIAV is more prominent during mechanical ventilation (positive-pressure breathing), where the pressure-volume relationships are reversed and the stroke volume variation is typically larger. This makes RIAV particularly useful in the ICU setting. The pulse pressure variation (PPV) derived from RIAV is also a validated clinical indicator of fluid responsiveness in mechanically ventilated patients.
Respiratory-Induced Intensity Variation (RIIV)
Respiratory chest wall movement and changes in venous blood volume in the measurement site produce slow oscillations in the baseline (DC component) of the PPG signal. During inspiration, decreased intrathoracic pressure reduces venous pressure in the upper body, decreasing venous blood volume at the finger or earlobe and reducing the baseline PPG absorption. This creates a baseline intensity modulation at the respiratory frequency.
RIIV is extracted by low-pass filtering the raw PPG signal (below approximately 0.5 Hz) or by analyzing the DC component (baseline) of the PPG waveform. Nilsson et al. (2000) first characterized this mechanism and demonstrated its utility for respiratory rate estimation (DOI: 10.1118/1.1289377).
RIIV is the simplest modulation to extract computationally, requiring minimal preprocessing. However, it is highly susceptible to motion artifacts, sensor repositioning, and baseline drift, which produce low-frequency variations indistinguishable from respiratory modulation.
Extraction Algorithms
Time-Domain Peak Counting
The simplest respiratory rate estimation approach identifies peaks in the respiratory modulation signal (RIAV envelope, RIIV baseline, or RIFV IBI series) and computes the breathing rate from the inter-peak intervals.
The algorithm involves:
- Bandpass filtering the modulation signal to the respiratory frequency band (0.1-0.6 Hz, corresponding to 6-36 breaths/min).
- Detecting peaks (or zero crossings) in the filtered signal.
- Computing the median inter-peak interval over a window (typically 30-60 seconds).
- Converting the interval to respiratory rate in breaths per minute.
Peak counting is computationally trivial and works well when the respiratory signal is strong and regular. It fails with irregular breathing, low signal-to-noise ratio, and when harmonic or subharmonic peaks create ambiguity.
Frequency-Domain Spectral Analysis
Spectral methods estimate respiratory rate by identifying the dominant spectral peak in the respiratory frequency band. The approach involves:
- Computing the power spectral density (PSD) of the modulation signal using FFT, Welch's method, or autoregressive (AR) modeling.
- Identifying the spectral peak with the highest power within the respiratory band (0.1-0.6 Hz).
- Reporting the peak frequency as the respiratory rate estimate.
AR spectral estimation (typically order 8-12) provides better frequency resolution than FFT for short analysis windows, which is important for respiratory rate estimation where the respiratory signal may contain only 3-6 cycles within a 30-second window. Chon et al. (2009) demonstrated that AR-based spectral analysis of the PPG IBI series achieved respiratory rate MAE of 1.0 breath/min in 29 healthy subjects at rest (DOI: 10.1109/TBME.2009.2024870).
A key challenge in spectral methods is distinguishing the true respiratory peak from cardiac harmonics. The second harmonic of the respiratory frequency can overlap with the fundamental cardiac frequency, and the subharmonic of the cardiac frequency can fall within the respiratory band. Spectral tracking algorithms that enforce temporal continuity of the respiratory frequency estimate can mitigate these ambiguities.
Advanced Signal Processing Methods
Empirical Mode Decomposition (EMD): EMD decomposes the PPG signal into intrinsic mode functions (IMFs) without assuming any particular basis. The IMF corresponding to the respiratory frequency band contains the respiratory modulation. Moody et al. demonstrated that EMD-based respiratory extraction achieves accuracy comparable to spectral methods while being more adaptive to non-stationary signals.
Continuous Wavelet Transform (CWT): The CWT provides time-frequency representation of the modulation signal, enabling tracking of respiratory frequency changes over time. The scalogram (magnitude of the CWT) shows the respiratory frequency as a ridge that can be tracked using ridge extraction algorithms. Addison et al. (2015) pioneered CWT-based respiratory rate estimation from PPG in the Nellcor Respiration Rate (RRp) algorithm, which was the first FDA-cleared algorithm for respiratory rate extraction from pulse oximetry (DOI: 10.1007/s10877-014-9607-0).
Singular Spectrum Analysis (SSA): SSA decomposes the signal into oscillatory components based on eigendecomposition of the trajectory matrix. It can separate respiratory and cardiac components even when their frequencies are close, and it does not require predefined basis functions or frequency bands.
Smart Fusion of Multiple Respiratory Estimates
Since RIFV, RIAV, and RIIV capture respiratory information through different physiological mechanisms, they provide complementary estimates with different strengths and failure modes. Fusing multiple estimates improves robustness and accuracy.
Quality-Weighted Fusion
Karlen et al. (2013) proposed a smart fusion framework that computes respiratory rate estimates from all three modulations independently, assesses the quality of each estimate, and combines them using quality-weighted averaging (DOI: 10.1109/TBME.2013.2246160). Quality is assessed by:
- Signal quality index (SQI): Measures the regularity and consistency of the modulation signal. Low SQI indicates noise contamination.
- Spectral peak prominence: A sharp, isolated spectral peak indicates a reliable estimate; a broad or ambiguous peak indicates uncertainty.
- Cross-modulation agreement: When two or more modulations agree on the respiratory frequency, confidence increases.
In a study of 59 subjects, the smart fusion approach achieved MAE of 0.9 breaths/min compared to reference capnography, outperforming any single modulation source (RIFV alone: 1.4 breaths/min; RIAV alone: 1.6 breaths/min; RIIV alone: 1.8 breaths/min).
Bayesian Fusion
Pimentel et al. (2017) developed a Bayesian fusion framework that maintains probability distributions over respiratory rate from each modulation source and combines them using Bayes' theorem (DOI: 10.1109/TBME.2016.2613124). The Bayesian approach naturally handles uncertainty: when one source provides an unreliable estimate with high variance, its influence on the fused estimate is automatically reduced. The method also incorporates a temporal prior that penalizes large changes in respiratory rate between consecutive windows, exploiting the physiological constraint that breathing rate changes slowly under normal conditions.
Evaluated on the CapnoBase benchmark dataset (42 subjects, 8 minutes each, with reference capnography), the Bayesian fusion approach achieved MAE of 0.6 breaths/min, which represents less than 5% relative error at typical resting breathing rates.
Performance Benchmarks and Datasets
The CapnoBase Benchmark
The CapnoBase dataset (Karlen et al., 2010) has become the standard benchmark for PPG respiratory rate estimation. It contains 42 recordings of 8 minutes each from pediatric and adult patients during elective surgery, with simultaneous finger PPG and capnography (reference respiratory rate). The dataset is publicly available, enabling direct comparison across methods.
State-of-the-art results on CapnoBase:
| Method | MAE (breaths/min) | Coverage | |--------|-------------------|----------| | Single best modulation | 1.4-1.8 | 100% | | Smart fusion (Karlen, 2013) | 0.9 | 100% | | Bayesian fusion (Pimentel, 2017) | 0.6 | 100% | | Deep learning (Ravichandran, 2019) | 0.5 | 95% | | CWT ridge tracking (Addison, 2015) | 1.0 | 92% |
Coverage refers to the percentage of analysis windows for which the algorithm produces an estimate; some methods abstain from low-confidence windows.
The BIDMC Dataset
The BIDMC (Beth Israel Deaconess Medical Center) dataset provides 53 recordings from critically ill adult patients with reference respiratory rate from impedance pneumography. This dataset is more challenging than CapnoBase because ICU patients often have irregular breathing, cardiac arrhythmias, and lower signal quality.
Pimentel et al. (2017) reported MAE of 1.8 breaths/min on BIDMC using Bayesian fusion, compared to 0.6 breaths/min on CapnoBase, illustrating the performance gap between controlled and clinical settings.
Deep Learning Approaches
Recent work has applied deep learning to PPG respiratory rate estimation, bypassing hand-crafted feature extraction.
End-to-End CNN Models
Ravichandran et al. (2019) trained a 1D-CNN to estimate respiratory rate directly from 32-second windows of raw PPG, achieving MAE of 0.5 breaths/min on CapnoBase (DOI: 10.1145/3341163.3347744). The model learned to extract respiratory modulations implicitly, without explicit RIFV/RIAV/RIIV decomposition. Analysis of the learned filters revealed that the first convolutional layer learned bandpass filters in the respiratory frequency range, while deeper layers captured modulation patterns similar to the traditional decomposition.
Temporal CNN with Attention
Bian et al. (2020) applied a temporal convolutional network (TCN) with attention mechanisms to PPG respiratory rate estimation. The attention layer learned to weight different segments of the input window based on signal quality, effectively performing automatic quality-gated estimation. The model achieved MAE of 1.1 breaths/min on the BIDMC dataset, a 39% improvement over the Bayesian fusion baseline (DOI: 10.1109/JBHI.2020.2990423).
Clinical Applications
Hospital Patient Monitoring
The most immediate clinical application is continuous respiratory rate monitoring from existing pulse oximeters. The Nellcor Respiration Rate (RRp) algorithm by Medtronic is FDA-cleared and deployed in hospital monitors, extracting respiratory rate from the pulse oximeter PPG signal without requiring additional sensors. Clinical validation studies have shown MAE of 1-2 breaths/min compared to manual counting by nurses, which itself has significant inter-observer variability (Bergese et al., 2017; DOI: 10.1213/ANE.0000000000001642).
This capability is particularly valuable for post-surgical patients on general wards, where continuous capnography is impractical but early detection of respiratory depression (from opioid analgesia) can prevent critical events.
Wearable Sleep Monitoring
Consumer wearables including Apple Watch, Fitbit, and Garmin devices now report respiratory rate during sleep. The constrained environment (minimal motion, regular breathing) makes sleep an ideal use case for PPG-based respiratory rate estimation. Longitudinal tracking of sleeping respiratory rate can identify trends associated with respiratory infections, heart failure decompensation, and sleep-disordered breathing.
For how respiratory rate estimation integrates with broader sleep staging algorithms, see our companion article. Our algorithms reference and signal processing guide provide additional implementation details for PPG-based respiratory analysis, and our article on PPG motion artifact removal covers the essential preprocessing steps for extracting clean respiratory modulations.
References
- Addison, P.S. et al. (2015). Journal of Clinical Monitoring and Computing. DOI: 10.1007/s10877-014-9607-0
- Bergese, S.D. et al. (2017). Anesthesia & Analgesia. DOI: 10.1213/ANE.0000000000001642
- Bian, Z. et al. (2020). IEEE Journal of Biomedical and Health Informatics. DOI: 10.1109/JBHI.2020.2990423
- Chon, K.H. et al. (2009). IEEE Transactions on Biomedical Engineering. DOI: 10.1109/TBME.2009.2024870
- Churpek, M.M. et al. (2012). Resuscitation. DOI: 10.1016/j.resuscitation.2012.02.009
- Karlen, W. et al. (2013). IEEE Transactions on Biomedical Engineering. DOI: 10.1109/TBME.2013.2246160
- Lázaro, J. et al. (2014). Physiological Measurement. DOI: 10.1088/0967-3334/35/7/1407
- Nilsson, L. et al. (2000). Medical Physics. DOI: 10.1118/1.1289377
- Pimentel, M.A.F. et al. (2017). IEEE Transactions on Biomedical Engineering. DOI: 10.1109/TBME.2016.2613124