ChatPPG Editorial

Motion Artifact Detection in PPG: How to Flag Bad Windows Before Estimation

Practical methods for motion artifact detection in PPG using signal quality indices, accelerometer fusion, spectral tests, and reject-or-repair rules.

ChatPPG Research Team

2026-04-13

6 min read

Detecting motion-corrupted PPG windows before running heart rate or morphology algorithms reduces false estimates and downstream bias. Use lightweight signal quality indices, accelerometer-derived features, and simple spectral tests in a cascade to flag bad windows in real time.

Quick answer

Combine an energy-and-spectral test on the PPG channel with accelerometer correlation and a beat-consistency score to create a three-tier quality flag. Use conservative rejection for morphology features and permissive acceptance for heart rate when accelerometer energy is low.

Why artifact detection matters

A PPG-based estimator will produce a number even when the signal lacks cardiac content. That number can be confidently wrong. Detecting bad windows prevents spurious alerts, reduces downstream false positives, and improves model calibration. This is essential for HRV, blood pressure surrogates, and clinical trend detection where a single bad window can bias a day-long summary.

Related reading: see our posts on adaptive filtering, PPG HRV and motion artifacts, and wrist PPG accuracy limitations.

Motion artifact types and signatures

Motion artifacts appear differently depending on source and device mounting. Typical signatures:

Micro-vibrations create broadband high-frequency jitter that raises noise floor without destroying peak periodicity.
Sensor sliding causes slow baseline wander and occasional large amplitude steps.
Repetitive movement, such as running, produces strong spectral energy at step frequency and harmonics that can mask the cardiac peak.
Contact loss or occlusion results in near-zero pulsatile amplitude with preserved DC component.

Each signature implies different detection features and different repair strategies.

Core per-window features

Use a small set of robust, fast-to-compute features on windows of 2 to 8 seconds.

Time-domain features

Normalized energy: sum(x^2) divided by median energy across a longer baseline. Sudden rises or drops indicate artifact.
Kurtosis: high kurtosis often indicates spikes.
Envelope variance: variance of the analytic signal envelope.

Spectral features

Cardiac band power ratio: power in 0.7 to 3.5 Hz divided by total power.
Peak prominence: ratio of the largest spectral peak magnitude to the median magnitude in the band.
Step-harmonic index: energy at the dominant accelerometer spectral peaks aligned with PPG spectral peaks.

Morphology and beat-level features

Beat consistency: mean cross-correlation between successive beat templates.
RR stability: coefficient of variation of detected RR intervals.
Pulse amplitude stability: CV of beat peak amplitudes.

Accelerometer features

Vector magnitude energy and variance.
Dominant step frequency and its power.
Correlation between accelerometer envelope and PPG envelope.

A practical SQI formula

A lightweight SQI can be a linear combination of normalized features:

SQI = w1cardiac_power_ratio + w2beat_corr - w3accel_energy_norm - w4envelope_variance

Calibrate weights on a labeled dataset. Use three bands for decision making:

Accept: SQI >= 0.6
Suspect: 0.35 <= SQI < 0.6
Reject: SQI < 0.35

Tune thresholds to your device and use subject-level normalization when possible.

Deployment patterns

Cascade detectors

Fast spectral check for catastrophic failure: if cardiac_power_ratio < 0.15, reject immediately. This check is cheap and prevents wasteful downstream computation.
Accelerometer gate: if accel_energy_norm exceeds a device-specific threshold and correlation with PPG envelope is high, mark as motion-correlated.
Morphology pass: only allow morphology-sensitive estimates when beat_corr exceeds a higher threshold such as 0.7.

Window length considerations

Short windows (2 s) provide low latency at the cost of variability. This is useful for user feedback.
Medium windows (4-8 s) provide more stable SQI for clinical metrics such as HRV.

Resource constraints

For on-device deployment, compute only time- and spectral-domain features with O(N log N) FFT cost and avoid expensive template matching unless needed. Consider incremental FFT or sliding-window updates to reduce CPU.
For cloud processing, include beat-level correlation checks and ML models for final adjudication.

Repair strategies

Short gaps (<= 5 s)

Linearly interpolate missing beat times or copy the preceding beat template with amplitude scaling. Mark imputed beats so downstream models can treat them differently.

Medium gaps (5 to 30 s)

Use adaptive filtering with accelerometer as reference to attempt salvage. If salvage fails, return a quality-degraded estimate and a confidence score.

Long gaps (> 30 s)

Do not impute. Mark the segment as unusable for morphology and HRV. For heart rate trends, report a hold value with an explicit gap indicator.

Example thresholds and rules of thumb

cardiac_power_ratio < 0.2: likely reject
beat_corr < 0.6: reject for morphology
accel_energy_norm > 3x baseline: strong motion flag
amplitude drop below 20% of baseline: contact loss

These are starting points. Device optics and mounting change the numeric values.

Case study: walking vs running

In walking, the step frequency often falls between 0.9 and 1.6 Hz. This overlaps the lower end of the cardiac band when heart rate is low. In such cases, rely more on template matching and beat_corr metrics than raw spectral dominance. During running, step frequency shifts toward 2 to 3 Hz and can swamp the cardiac peak for moderate heart rates. Detect step-harmonic coupling by cross-power analysis between accelerometer and PPG.

Practical takeaway: if step-harmonic index is high and beat_corr is low, reject morphology estimates but consider using a robust spectral tracker or accelerometer-adaptive cancellation before accepting heart rate.

Logging, user feedback, and UX

Surface a three-level quality indicator to users: Good, Marginal, Bad. This is simpler to interpret than numeric SQI values.
Log rejected windows with a short snapshot for later triage. This helps debug device placement or firmware regressions.
For clinical workflows, include a timestamped quality audit trail so clinicians can see when estimates were gated.

Evaluation checklist

Cross-subject validation
AUC and confusion matrix on labeled windows
Downstream impact on HR MAE and HRV RMSE
Latency measurement on target device

Dataset and privacy notes

Use diverse datasets that include multiple activities and skin tones. Public datasets such as PPG-DaLiA and portions of MIMIC contain labeled PPG and accelerometer data useful for benchmarking. When collecting new data, protect subject privacy, remove direct identifiers, and store raw signals securely.

Next steps

Start with a conservative SQI on device and collect rejected-window examples for later labeling. Iterate thresholds with real-world users and log the impact on downstream errors. Pair detection with a transparent UX so users understand when data are insufficient.

References

Bent B, Goldstein B A, Kibbe WA, Dunn JP. Investigating sources of inaccuracy in wearable optical heart rate sensors. npj Digital Medicine. 2020. https://doi.org/10.1038/s41746-020-0226-6
Hernández-Vicente et al. Comparative evaluation of wearable devices for heart rate and HRV under motion. Sensors. 2022. https://doi.org/10.3390/s22020559
Gillinov AM, Synan A, et al. Cognitive and motion confounders in wearable heart rate measurements. Journal of the American College of Cardiology. 2017. https://doi.org/10.1016/j.jacc.2017.01.071

FAQ

What window length should I use for SQI? 2 to 8 seconds. Short windows give fast response but noisier SQI. For HRV and morphology use 4 to 8 seconds.

Can accelerometer alone determine quality? No. Accelerometer high energy indicates motion but the PPG band may still preserve enough cardiac signal. Combine accel checks with spectral and morphology tests.

How should I label windows for training? Label windows as clean, motion, or contact-loss. Use multiple annotators and consensus for ambiguous cases. Include device and subject diversity.

Should I repair or reject motion windows? Repair short gaps conservatively and mark imputed data. Prefer rejection for morphology metrics and medical decisions.

Do I need ML for production? Not necessarily. Engineered SQIs often work well and are easier to validate. ML adds performance when labeled data is available.

How do I measure improvement from detection? Compare downstream estimator error with and without gating. Track false alert rate in real deployments.

Frequently Asked Questions

What window length should I use for SQI?: 2 to 8 seconds. Short windows give fast response but noisier SQI. For HRV and morphology use 4 to 8 seconds.
Can accelerometer alone determine quality?: No. Accelerometer high energy indicates motion but the PPG band may still preserve enough cardiac signal. Combine accel checks with spectral and morphology tests.
How should I label windows for training?: Label windows as clean, motion, or contact-loss. Use multiple annotators and consensus for ambiguous cases. Include device and subject diversity.
Should I repair or reject motion windows?: Repair short gaps conservatively and mark imputed data. Prefer rejection for morphology metrics and medical decisions.
Do I need ML for production?: Not necessarily. Engineered SQIs often work well and are easier to validate. ML adds performance when labeled data is available.

← Back to all articles