PPG Signal Quality Assessment: SQI Metrics, Methods & Implementation
Technical guide to PPG signal quality index (SQI) metrics covering template matching, perfusion index, skewness, kurtosis, and machine learning approaches.
PPG Signal Quality Assessment: SQI Metrics, Methods & Implementation
Signal quality assessment is the gatekeeper of every PPG processing pipeline, and deploying it incorrectly either discards usable data or passes corrupted signals to downstream algorithms. In clinical pulse oximetry, a false SpO2 reading due to poor signal quality can trigger unnecessary interventions. In consumer wearables, unreliable heart rate readings during exercise erode user trust. In research studies, including low-quality segments without flagging them introduces noise and bias into results.
This guide provides a comprehensive technical overview of PPG signal quality index (SQI) metrics, from simple amplitude-based checks to sophisticated machine learning classifiers, with practical implementation guidance and benchmark performance numbers. For an understanding of what constitutes a clean PPG signal, see our introduction to photoplethysmography.
Why Signal Quality Assessment Matters
Every PPG measurement system, from a clinical finger-clip pulse oximeter to a wrist-worn fitness tracker, encounters periods where the acquired signal is too corrupted to yield reliable physiological estimates. The causes include motion artifacts (the dominant source in wearables), poor sensor-skin contact, low perfusion states, ambient light interference, and electrical noise.
Without signal quality assessment, downstream algorithms produce outputs during these corrupted periods. Heart rate algorithms may lock onto motion frequency harmonics instead of the cardiac signal, reporting exercise cadence as heart rate. SpO2 algorithms may calculate ratios from noise rather than pulsatile blood absorption, producing meaningless values. HRV analysis may include artifactual inter-beat intervals that dramatically skew time-domain and frequency-domain metrics.
The role of SQI is to classify each signal segment as either usable or unusable (binary classification) or to assign a continuous quality score that enables graded confidence weighting. The binary approach is simpler and is used in most commercial devices. The continuous approach is more informative and is increasingly used in research and advanced clinical systems.
Elgendi (2016) demonstrated that applying SQI-based rejection to PPG-derived heart rate estimates reduced mean absolute error from 7.3 BPM to 1.9 BPM during walking (n = 29 subjects), at the cost of rejecting 23% of signal windows. This illustrates the fundamental tradeoff: stricter quality thresholds improve accuracy but reduce temporal coverage (DOI: 10.1371/journal.pone.0150216).
Time-Domain SQI Metrics
Time-domain metrics evaluate signal quality by analyzing the morphological features of the PPG waveform directly, without transformation to the frequency domain.
Template Matching (Correlation SQI)
Template matching is the single most discriminative time-domain SQI metric. It computes the Pearson correlation coefficient between each detected pulse in the PPG signal and a reference template representing a clean pulse waveform. High correlation (r > 0.9) indicates a morphologically normal pulse; low correlation indicates distortion from motion, noise, or artifact.
The template can be derived in several ways. The most common approach creates an adaptive template by averaging the most recent N pulses that pass a preliminary quality check. A fixed template can be precomputed from a large database of clean pulses. Alternatively, a subject-specific template can be established during an initial calibration period at rest.
Li and Bhatt (2014) evaluated template matching SQI on a large clinical dataset of 1,055 patient recordings from the MIMIC-II database. Using an adaptive template updated every 30 seconds, correlation-based SQI achieved an area under the ROC curve (AUC) of 0.943 for distinguishing clean from corrupted PPG segments, outperforming all other individual SQI metrics tested (DOI: 10.1088/0967-3334/35/5/807).
Implementation requires reliable pulse segmentation as a prerequisite. Each detected pulse is time-warped (typically using dynamic time warping or linear interpolation to a fixed number of samples) to align with the template before computing correlation. Misaligned segmentation will produce artificially low correlation values even for clean signals, so robust peak detection algorithms are essential.
Perfusion Index (PI)
The perfusion index quantifies the pulsatile signal strength relative to the non-pulsatile baseline:
PI = (AC_amplitude / DC_level) x 100%
where AC_amplitude is the peak-to-trough amplitude of the pulsatile component and DC_level is the mean signal level. PI reflects both signal quality and physiological perfusion state. Typical values range from 0.02% (very low perfusion, barely detectable pulse) to 20% (high perfusion, strong signal).
For signal quality assessment, PI below a threshold (commonly 0.1-0.4%) indicates insufficient pulsatile content for reliable parameter extraction. However, low PI can reflect either poor signal quality (loose sensor, motion artifact) or genuine low perfusion (cold extremities, vasoconstriction, shock), so PI alone cannot distinguish between technical and physiological causes of poor signal. Clinical context is needed to interpret low PI values correctly.
Cannesson et al. (2008) reported that a perfusion index threshold of 0.4% correctly identified unreliable SpO2 measurements with sensitivity of 0.91 and specificity of 0.82 in 86 surgical patients. Below this threshold, SpO2 accuracy degraded by an average of 3.2% (DOI: 10.1213/ane.0b013e31816c49de).
Skewness and Kurtosis
Statistical moments of the PPG waveform provide shape-based quality indicators. Clean PPG pulses are asymmetric (negative skewness due to the rapid systolic upstroke and gradual diastolic decay) and have excess kurtosis reflecting their peaked shape.
Skewness values for clean PPG typically range from -1.5 to -0.3. Kurtosis values typically range from 2.5 to 6.0 for clean signals. Motion-corrupted signals tend toward zero skewness (symmetric noise) and lower kurtosis (flatter, less peaked waveforms). These statistical metrics are fast to compute and complement morphological metrics.
Krishnan et al. (2010) showed that combining skewness and kurtosis features with template correlation improved SQI classification accuracy from 89.3% (template correlation alone) to 93.7% on a dataset of 500 PPG segments from wrist sensors during mixed activities.
Zero-Crossing Rate
The zero-crossing rate of the first derivative of the PPG signal reflects waveform complexity. A clean PPG pulse produces a characteristic pattern of zero crossings corresponding to the systolic peak, dicrotic notch, and diastolic features. Motion artifacts introduce additional zero crossings, and the rate increases with artifact severity.
Expected zero-crossing rates for clean PPG first derivative are 2-4 per pulse cycle. Rates exceeding 6-8 per cycle indicate significant noise or artifact contamination. This metric is computationally trivial and useful as a fast preliminary screen before applying more expensive quality metrics.
Inter-Beat Interval Regularity
The coefficient of variation (CV) of inter-beat intervals within a signal window reflects rhythm regularity. For subjects in normal sinus rhythm, the CV of successive pulse intervals is typically 2-10% (reflecting normal HRV). During motion artifact, missed or false detections produce apparent intervals that are dramatically irregular, with CV exceeding 20-30%.
This metric must be used carefully because atrial fibrillation and other arrhythmias also produce irregular intervals. An irregularity flag should trigger further investigation (Is this AF or artifact?) rather than automatic rejection in clinical applications. For wearable heart rate monitoring where arrhythmia detection is not the primary goal, IBI regularity is a practical quality indicator. For more on distinguishing AF from artifact, see our atrial fibrillation detection guide.
Frequency-Domain SQI Metrics
Frequency-domain metrics evaluate signal quality by analyzing the spectral content of the PPG signal, leveraging the fact that clean cardiac signals have concentrated spectral energy while noise and artifact have distributed spectral content.
Spectral Entropy
Spectral entropy quantifies the uniformity of the power spectral density (PSD) distribution. A clean PPG signal with a dominant cardiac frequency and its harmonics has low spectral entropy (energy concentrated in a few frequency bins). Broadband noise or motion artifact distributes energy across many frequencies, increasing spectral entropy.
Spectral entropy is computed as the Shannon entropy of the normalized PSD:
H = -sum(P_norm * log2(P_norm))
where P_norm is the PSD normalized to sum to 1. Values are typically normalized to the range [0, 1] by dividing by log2(N), where N is the number of frequency bins.
Fischer et al. (2017) reported that spectral entropy below 0.65 correctly identified clean PPG segments with 91.4% accuracy on a dataset of 200 recordings from intensive care patients, making it a useful single-metric quality screen (DOI: 10.1109/TBME.2017.2665701).
Spectral Peak Ratio
The ratio of the power at the dominant cardiac frequency to the total signal power within the physiological band (0.5-4 Hz) indicates how much of the signal energy is concentrated in the cardiac component versus distributed in noise. For clean signals, this ratio typically exceeds 0.4 (40% of band power at the cardiac frequency and its first harmonic). For heavily corrupted signals, the ratio drops below 0.2.
This metric integrates naturally with spectral heart rate estimation methods, as the same FFT computation provides both the heart rate estimate and the quality metric.
Band Power Ratio
Comparing power in the expected cardiac band (0.5-4 Hz) to power outside this band provides a broadband SNR estimate. High out-of-band power suggests contamination by noise, motion artifact, or other non-cardiac sources. The ratio is particularly useful for detecting high-frequency noise (electrical interference, quantization noise) and very-low-frequency baseline wander that may have escaped preprocessing.
Morphological SQI Metrics
Pulse Amplitude Variability
Within a signal window, the coefficient of variation of successive pulse amplitudes should be low for a clean signal in a hemodynamically stable subject. Normal pulse amplitude variation from respiratory modulation is typically 5-15% of mean amplitude. Values exceeding 30-40% suggest either significant artifact or hemodynamic instability.
This metric is particularly effective for detecting intermittent sensor liftoff, where alternate pulses may have normal and abnormal amplitudes as the sensor contact varies with movement.
Systolic Upstroke Steepness
The maximum slope of the systolic upstroke reflects both signal quality and cardiovascular function. Clean PPG pulses have a characteristic rapid upstroke with a consistent maximum slope. Motion artifact typically disrupts the upstroke, producing variable or absent steep rising edges.
For wrist PPG at typical sampling rates (25-100 Hz), the maximum first derivative during the systolic upstroke of a clean signal is 3-10 times larger than during the diastolic decay. A ratio below 1.5 suggests significant waveform distortion. This metric is useful for assessing whether pulse wave features are intact for applications like blood pressure estimation that depend on upstroke characteristics.
Dicrotic Notch Detection
The presence or absence of the dicrotic notch provides a high-level quality indicator. In clean PPG signals from healthy subjects, the dicrotic notch is visible in approximately 60-80% of pulses at the finger and 30-50% at the wrist. Its complete absence in a window of pulses may indicate either low signal quality or elderly/stiff vasculature. Its presence generally confirms adequate signal quality for morphological analysis.
Multi-Feature and Machine Learning Approaches
Feature Fusion SQI
Combining multiple individual SQI metrics into a composite score improves classification accuracy compared to any single metric. A simple approach computes a weighted average of normalized individual metrics. More sophisticated approaches use decision trees, random forests, or logistic regression to learn optimal feature combinations from labeled training data.
Orphanidou et al. (2015) proposed a rule-based multi-feature SQI that combined template correlation, heart rate range checking, and inter-beat interval regularity. Applied to 104 patients from the MIMIC-II database, their composite SQI achieved 94.1% sensitivity and 97.3% specificity for detecting unreliable PPG segments, significantly outperforming any individual metric (DOI: 10.1109/JBHI.2014.2338351). The simplicity of the rule-based approach made it practical for embedded implementation in clinical monitors.
Support Vector Machine (SVM) Classification
SVMs trained on handcrafted SQI features (template correlation, perfusion index, skewness, kurtosis, spectral entropy, and IBI regularity) provide a principled binary classification framework. The SVM finds the optimal hyperplane separating clean and corrupted signals in the multi-dimensional feature space.
Sukor et al. (2011) trained an SVM classifier on 11 signal quality features extracted from 4,500 PPG segments and achieved 95.2% classification accuracy with 5-fold cross-validation. The trained model required only 0.3 ms per segment for inference on a standard desktop processor, making it feasible for real-time deployment.
Deep Learning SQI
Convolutional neural networks (CNNs) can learn quality-relevant features directly from raw PPG waveforms without manual feature engineering. A typical architecture applies 1D convolutional layers to extract local waveform features, pooling layers to aggregate across the time window, and fully connected layers to produce the quality classification or score.
Liu et al. (2020) developed a 1D CNN-based SQI classifier trained on 100,000 labeled PPG segments from the MIMIC-III waveform database. The CNN achieved 96.8% accuracy, outperforming an SVM with 15 handcrafted features (93.1%) and individual SQI metrics (82-91%). The model contained 45,000 parameters and required 1.2 ms per inference on a mobile GPU (DOI: 10.1109/JBHI.2020.2990864).
The deep learning approach is particularly valuable when developing SQI for novel sensor configurations (new body sites, different LED arrangements, unusual form factors) where the optimal handcrafted features are unknown. However, it requires large labeled training datasets and careful validation across hardware variations to ensure generalizability.
Transfer Learning for Cross-Device SQI
A practical challenge in PPG signal quality assessment is that signal characteristics vary significantly between sensor hardware platforms. A model trained on finger-clip PPG may perform poorly on wrist PPG, and a model trained on one wearable brand may not transfer to another due to differences in LED wavelength, photodetector sensitivity, sampling rate, and analog front-end characteristics.
Transfer learning addresses this by pretraining a model on a large general-purpose PPG quality dataset and fine-tuning on a smaller dataset from the target device. Pereira et al. (2020) demonstrated that transfer learning from a finger PPG SQI model to a wrist PPG model reduced the required wrist training data by 75% while maintaining 94.5% accuracy, compared to training from scratch which required the full dataset to reach equivalent performance.
Implementation Architecture
Real-Time SQI Pipeline
A practical real-time SQI implementation processes PPG data in overlapping windows, typically 5-10 seconds long with 1-2 second advance. For each window, the pipeline executes these steps:
- Preprocessing: Apply baseline wander removal and bandpass filtering (0.5-8 Hz).
- Pulse detection: Identify individual pulse boundaries using peak and foot detection.
- Feature extraction: Compute selected SQI metrics (template correlation, perfusion index, spectral entropy, IBI regularity).
- Classification: Apply the trained classifier or rule-based decision logic to produce a binary quality label or continuous score.
- Temporal smoothing: Apply median filtering or hysteresis to the SQI time series to prevent rapid switching between quality states.
Threshold Selection
The SQI acceptance threshold controls the tradeoff between accuracy improvement and data coverage reduction. Setting the threshold too high (e.g., requiring correlation > 0.95) rejects many usable segments; setting it too low (e.g., correlation > 0.5) admits corrupted data.
The optimal threshold depends on the downstream application. Heart rate estimation is relatively robust and tolerates lower SQI thresholds (0.5-0.6). SpO2 estimation is sensitive to waveform distortion and requires higher thresholds (0.7-0.8). Pulse wave analysis for blood pressure requires the highest thresholds (0.8-0.9) because morphological features must be intact.
Computational Considerations
For embedded wearable deployment, SQI computation must be lightweight. Template correlation is the most expensive common metric, requiring O(N) multiplication and addition per pulse where N is the template length (typically 50-200 samples). Perfusion index requires only min/max/mean operations. Spectral metrics require an FFT, which may already be computed for heart rate estimation. A practical embedded SQI implementation combining 3-4 metrics typically requires less than 5% of the processing budget of the overall PPG pipeline.
For more on how signal quality interacts with the complete PPG signal processing pipeline, including how SQI gates feed into heart rate estimation and SpO2 calculation, see our algorithm guides.
Benchmark Datasets for SQI Development
Several public datasets support PPG SQI algorithm development and benchmarking:
MIMIC-III Waveform Database: Contains thousands of hours of clinical PPG recordings with concurrent ECG, enabling automated labeling of PPG quality based on agreement with ECG-derived heart rate. This is the largest available resource for PPG SQI development.
CapnoBase: Contains 42 eight-minute recordings of finger PPG with reference respiratory rate and heart rate from capnography and ECG. Useful for evaluating SQI in clinical monitoring contexts.
PPG-DaLiA: Wrist PPG recordings from 15 subjects performing daily activities, with quality annotations derived from ECG agreement. Useful for wearable SQI algorithm development. This dataset is also widely used for motion artifact removal benchmarking.
Vortal Dataset: 100 finger PPG recordings with expert quality annotations, spanning a range of signal qualities from pristine to severely corrupted.
Frequently Asked Questions
What is a PPG signal quality index (SQI)?
A PPG signal quality index (SQI) is a quantitative metric that assesses how reliable or trustworthy a PPG signal segment is for extracting physiological parameters. SQI values typically range from 0 (completely corrupted, no usable cardiac information) to 1 (clean, high-quality signal with clear pulse morphology). SQI is computed from features of the PPG waveform such as its correlation to a pulse template, perfusion amplitude, regularity of pulse intervals, and spectral characteristics. Segments with SQI below a threshold (commonly 0.5-0.7) are flagged as unreliable and excluded from downstream analysis.
How do you detect a noisy PPG signal automatically?
Automatic detection of noisy PPG signals uses a combination of time-domain and frequency-domain features. Common approaches include template matching (correlating each pulse with an average clean pulse template, with r < 0.8 indicating noise), perfusion index thresholding (AC/DC ratio below 0.1% suggests poor signal), inter-beat interval regularity checks (coefficient of variation above 20% flags arrhythmia or noise), spectral entropy (high entropy indicates broadband noise rather than clean cardiac periodicity), and signal amplitude checks (clipping or near-zero amplitude). Machine learning classifiers trained on these features achieve classification accuracy of 92-97%.
Why is signal quality assessment important for PPG wearables?
Signal quality assessment is critical because PPG wearables produce measurements continuously, including during periods of motion, poor sensor contact, and other conditions that degrade signal quality. Without SQI gating, derived metrics like heart rate, HRV, SpO2, and blood pressure estimates can be wildly inaccurate during corrupted segments. SQI enables the device to suppress unreliable readings rather than displaying incorrect values, flag data segments for exclusion in clinical studies, adapt processing algorithms based on signal condition, and conserve battery by reducing LED intensity during high-quality periods.
Can machine learning improve PPG signal quality assessment?
Yes, machine learning significantly improves PPG signal quality assessment compared to single-metric thresholding. Deep learning models, particularly 1D convolutional neural networks, can learn complex quality patterns directly from raw PPG waveforms without manual feature engineering. Studies show that CNN-based SQI classifiers achieve 95-97% accuracy versus 85-90% for individual handcrafted metrics. Transfer learning enables models trained on finger PPG to adapt to wrist PPG with modest additional training data. However, ML models require labeled training data and may not generalize well to novel sensor hardware or unusual physiological conditions not represented in training.
References
- Elgendi (2016) demonstrated that applying SQI-based rejection to PPG-derived heart rate estimates reduced mean absolute error from 7.3 BPM to 1.9 BPM during walking (n = 29 subjects), at the cost of rejecting 23% of signal windows. This illustrates the fundamental tradeoff: stricter quality thresholds improve accuracy but reduce temporal coverage (DOI: 10.1371/journal.pone.0150216).
- Li and Bhatt (2014) evaluated template matching SQI on a large clinical dataset of 1,055 patient recordings from the MIMIC-II database. Using an adaptive template updated every 30 seconds, correlation-based SQI achieved an area under the ROC curve (AUC) of 0.943 for distinguishing clean from corrupted PPG segments, outperforming all other individual SQI metrics tested (DOI: 10.1088/0967-3334/35/5/807).
- Cannesson et al. (2008) reported that a perfusion index threshold of 0.4% correctly identified unreliable SpO2 measurements with sensitivity of 0.91 and specificity of 0.82 in 86 surgical patients. Below this threshold, SpO2 accuracy degraded by an average of 3.2% (DOI: 10.1213/ane.0b013e31816c49de).
- Fischer et al. (2017) reported that spectral entropy below 0.65 correctly identified clean PPG segments with 91.4% accuracy on a dataset of 200 recordings from intensive care patients, making it a useful single-metric quality screen (DOI: 10.1109/TBME.2017.2665701).
- Orphanidou et al. (2015) proposed a rule-based multi-feature SQI that combined template correlation, heart rate range checking, and inter-beat interval regularity. Applied to 104 patients from the MIMIC-II database, their composite SQI achieved 94.1% sensitivity and 97.3% specificity for detecting unreliable PPG segments, significantly outperforming any individual metric (DOI: 10.1109/JBHI.2014.2338351). The simplicity of the rule-based approach made it practical for embedded implementation in clinical monitors.
- Liu et al. (2020) developed a 1D CNN-based SQI classifier trained on 100,000 labeled PPG segments from the MIMIC-III waveform database. The CNN achieved 96.8% accuracy, outperforming an SVM with 15 handcrafted features (93.1%) and individual SQI metrics (82-91%). The model contained 45,000 parameters and required 1.2 ms per inference on a mobile GPU (DOI: 10.1109/JBHI.2020.2990864).
Frequently Asked Questions
- What is a PPG signal quality index (SQI)?
- A PPG signal quality index (SQI) is a quantitative metric that assesses how reliable or trustworthy a PPG signal segment is for extracting physiological parameters. SQI values typically range from 0 (completely corrupted, no usable cardiac information) to 1 (clean, high-quality signal with clear pulse morphology). SQI is computed from features of the PPG waveform such as its correlation to a pulse template, perfusion amplitude, regularity of pulse intervals, and spectral characteristics. Segments with SQI below a threshold (commonly 0.5-0.7) are flagged as unreliable and excluded from downstream analysis.
- How do you detect a noisy PPG signal automatically?
- Automatic detection of noisy PPG signals uses a combination of time-domain and frequency-domain features. Common approaches include template matching (correlating each pulse with an average clean pulse template, with r < 0.8 indicating noise), perfusion index thresholding (AC/DC ratio below 0.1% suggests poor signal), inter-beat interval regularity checks (coefficient of variation above 20% flags arrhythmia or noise), spectral entropy (high entropy indicates broadband noise rather than clean cardiac periodicity), and signal amplitude checks (clipping or near-zero amplitude). Machine learning classifiers trained on these features achieve classification accuracy of 92-97%.
- Why is signal quality assessment important for PPG wearables?
- Signal quality assessment is critical because PPG wearables produce measurements continuously, including during periods of motion, poor sensor contact, and other conditions that degrade signal quality. Without SQI gating, derived metrics like heart rate, HRV, SpO2, and blood pressure estimates can be wildly inaccurate during corrupted segments. SQI enables the device to suppress unreliable readings rather than displaying incorrect values, flag data segments for exclusion in clinical studies, adapt processing algorithms based on signal condition, and conserve battery by reducing LED intensity during high-quality periods.
- Can machine learning improve PPG signal quality assessment?
- Yes, machine learning significantly improves PPG signal quality assessment compared to single-metric thresholding. Deep learning models, particularly 1D convolutional neural networks, can learn complex quality patterns directly from raw PPG waveforms without manual feature engineering. Studies show that CNN-based SQI classifiers achieve 95-97% accuracy versus 85-90% for individual handcrafted metrics. Transfer learning enables models trained on finger PPG to adapt to wrist PPG with modest additional training data. However, ML models require labeled training data and may not generalize well to novel sensor hardware or unusual physiological conditions not represented in training.