ChatPPG Editorial

PPG Preprocessing Pipeline for Wearables: From Raw Signal to Clean Beats

A wearable PPG preprocessing pipeline should repair timestamps, filter with intent, resample carefully, segment clean beats, and reject bad windows first.

ChatPPG Research Team

2026-04-13

9 min read

A good PPG preprocessing pipeline is ordered, not improvised. For wearable data, the sequence that works best is usually: verify timestamps and dropouts first, remove obvious corruption, resample only when you truly need a fixed grid, apply targeted filtering, then segment and quality-gate beats before you compute features or feed a model.

That order keeps a pipeline from looking clean while being wrong. Many weak PPG systems fail because they normalize too early, smooth too aggressively, or resample away timing information before they know whether the window is usable.

This guide focuses on wearable signals, where sample rates are lower, motion is constant, and sensor-skin coupling changes minute to minute. For related building blocks, see PPG baseline wander removal, bandpass filter design for PPG, PPG signal quality assessment, and PPG peak detection algorithms.

The pipeline goal: preserve physiology while rejecting garbage

The purpose of preprocessing is not to make the waveform look pretty. It is to preserve the parts of the signal that matter for the task while preventing known failure modes from reaching later stages.

For wearable PPG, those failure modes usually include:

timestamp jitter or missing packets
baseline drift from contact pressure and posture
motion contamination that overlaps the cardiac band
narrow electrical interference
clipped or saturated samples
weak perfusion and changing amplitude scale
windows that are too poor to trust at all

A robust pipeline treats these as separate problems. One broad filter cannot solve all of them.

Step 1: Check timestamps before touching the amplitude

Many devices do not sample at a perfectly fixed interval once Bluetooth transport, batching, or adaptive power modes enter the picture. If timestamps are irregular, derivative-based features and beat intervals can become biased even when the raw amplitude looks acceptable.

Start by measuring:

median sample interval
jitter around that interval
count and duration of gaps
duplicate timestamps or out-of-order packets

If the gaps are short, you may be able to interpolate them later. If the gaps are long, the right answer is often to break the window and mark it unusable. Do not let a later filter smear across a transport failure and then pretend the beat timing is real.

Step 2: Remove impossible samples and mark corrupted spans

Before filtering, identify samples that are plainly invalid:

ADC saturation at the rails
repeated flat values from sensor stall
impossible jumps from packet corruption
negative values or wraparound from parsing bugs

Short isolated defects can be interpolated if the downstream task is tolerant. Longer spans should be masked and excluded from beat analysis. This is also the right stage to create a data-quality channel that travels with the waveform through the rest of the pipeline.

A useful rule is to separate repairable from non-repairable corruption. A two-sample glitch in a 100 Hz stream may be repairable. A half-second dropout is usually not.

Step 3: Decide whether resampling is even necessary

Teams resample by reflex, but wearable PPG does not always benefit from it.

Keep native timestamps when beat timing matters

If your endpoint is inter-beat interval, pulse arrival timing, or anything beat-to-beat, preserving native timestamps is often safer. You can detect peaks on the native grid and compute intervals from time rather than from sample index.

Resample when you need a fixed grid

Resampling is useful when:

your model expects a fixed sample rate
you want aligned fusion with accelerometer or ECG
your device changes sampling rate across modes
your FFT or window features are defined on a common grid

Downsample only after anti-alias protection

If you reduce sample rate, low-pass first. This sounds obvious, but it is a real source of wearable bugs. A 100 Hz stream downsampled to 25 Hz without proper anti-alias filtering can fold high-frequency electrical noise or LED drive contamination into the cardiac band.

Step 4: Remove slow drift with the task in mind

Baseline handling should be driven by what you plan to extract.

If your only goal is pulse rate, an aggressive high-pass or bandpass is usually fine. If you care about respiratory modulation or pulse amplitude variation, too much baseline removal can erase information you actually wanted.

A strong default for wearable cardiac processing is to control drift with one of these approaches:

high-pass or bandpass filtering for rate-focused applications
spline or trend removal when morphology must be preserved
longer-window detrending for learning pipelines that still want amplitude context

Our detailed guide on PPG baseline wander removal covers the tradeoffs in depth. The practical point here is simple: decide whether low-frequency content is noise or signal before you remove it.

Step 5: Apply a notch filter only when the spectrum proves you need one

Wearable teams often add a 50 or 60 Hz notch because it feels responsible. That is not always good engineering.

If your cardiac band ends at 5 to 8 Hz, a broader bandpass may already remove mains contamination. If your device samples at 25 or 50 Hz, the interference may alias rather than appear as a clean 50 or 60 Hz line. In those cases, a notch can be irrelevant or even misleading.

Use a notch when all three are true:

a narrow interference line is visible in the sampled spectrum
the line survives into the stage that matters for your endpoint
removing it improves the downstream metric without damaging beat structure

If you need the implementation details, see our guide to notch filtering for PPG powerline noise.

Step 6: Use a task-appropriate bandpass, not a generic one

There is no universal PPG passband. The right choice depends on what the model or algorithm needs.

For pulse rate and basic beat detection

A range around 0.5 to 8 Hz is a practical default for many adult wearable pipelines.

For high-motion wrist tracking

A narrower upper cutoff can improve robustness by suppressing ripples and noise, but the tradeoff is softer peaks and less morphology detail.

For morphology or derivative features

Keep more harmonic content and validate that peak sharpness, notch visibility, and derivative landmarks survive the filter.

The key mistake is to choose a passband from a paper, lock it in, and never re-evaluate it against your actual sensor and endpoint. The filter should be tuned together with the detector or model that follows it.

Step 7: Handle motion artifact as a separate stage

Motion is not just another kind of high-frequency noise. In wearables, motion often overlaps the cardiac band and corrupts amplitude, baseline, and pulse shape all at once.

That means motion handling usually belongs after basic cleanup but before you trust beats. Depending on the system, this stage may include:

accelerometer-referenced adaptive filtering
motion-aware Kalman or state-space filtering
spectral tracking with cadence rejection
simple window rejection when the signal is beyond recovery

The wrong move is to keep stacking filters in the hope that the signal will eventually become usable. Sometimes the best preprocessing decision is to say, this window is bad, do not extract physiology from it.

Step 8: Normalize only after you know what must stay absolute

Normalization is useful, but it can also quietly destroy information.

Use z-score or min-max for shape-driven models

If the downstream model cares more about shape than absolute amplitude, per-window normalization may help. This is common in deep learning pipelines.

Avoid early normalization for amplitude-based features

If you need perfusion index, pulse amplitude variability, or any absolute or relative amplitude feature, normalizing too early removes the quantity you wanted to measure.

Consider beat-wise normalization carefully

Beat-wise normalization can stabilize morphology analysis, but it also hides real beat-to-beat amplitude variation. That is good for one task and bad for another. Make the choice explicitly.

Step 9: Segment in a way that matches the endpoint

Windowing is not bookkeeping. It changes the behavior of every downstream metric.

Short windows for pulse rate updates

For responsive rate tracking, 4 to 8 second windows with overlap are often enough.

Medium windows for robust beat statistics

For beat morphology summaries or rate stability, 8 to 15 second windows are a strong middle ground.

Long windows for variability metrics

HRV and slower autonomic features need longer windows and stricter beat-quality control.

There are two segmentation views that often need to coexist:

fixed windows, for spectral or model input
beat-aligned segments, for morphology and fiducial features

The best wearable pipelines usually keep both. Fixed windows are convenient for streaming logic. Beat-aligned segments are better for accurate morphology.

Step 10: Run quality control at both window and beat level

Quality control is the gate that decides whether a clean-looking signal is actually usable.

Window-level QC

Useful window metrics include:

spectral concentration in the cardiac band
clipping percentage
accelerometer energy
variance or envelope stability
template consistency across beats

Beat-level QC

Useful beat checks include:

interval plausibility
local correlation with a running beat template
systolic rise shape
peak sharpness or width
presence of non-physiologic double peaks

A window can pass broad QC while still containing a few bad beats. If you want reliable HRV or morphology features, beat-level rejection matters.

This is where PPG signal quality assessment becomes operational rather than academic. The SQI should not sit in a report. It should change what the pipeline allows downstream.

A practical reference order for wearable PPG

If you want a default sequence that is hard to embarrass in production, start here:

ingest signal and timestamps
detect gaps, duplicates, and corrupted spans
mask or repair short invalid segments
resample only if the next stage requires a fixed grid
remove baseline drift according to the task
add a notch only if measured interference justifies it
apply the main bandpass or low-pass stage
run motion-aware denoising or rejection
compute signal-quality metrics
detect beats and refine fiducials
reject bad beats or bad windows
extract features or feed the model

Example implementation checklist

pipeline = [
    "timestamp_qc",
    "sample_qc",
    "optional_resample",
    "detrend_or_baseline_remove",
    "optional_notch",
    "bandpass",
    "motion_handling",
    "window_qc",
    "beat_detection",
    "beat_qc",
    "feature_extraction_or_model_input",
]

That looks simple, but each label hides a decision. The point is not to memorize the list. The point is to stop treating preprocessing as one block called clean_signal().

Common mistakes that quietly break wearable pipelines

The repeat offenders are resampling before checking timestamps, over-filtering before beat detection, using one pipeline for every endpoint, and normalizing away amplitude information that mattered to the task. The last one is simple but important: some windows are bad, and a good pipeline should return low confidence instead of a confident lie.

What to validate before you trust the pipeline

Validate on more than a few clean traces. At minimum, test rest versus motion, low-perfusion cases, clipping and packet gaps, subject-wise generalization, and beat timing rather than only smoothed heart rate. For morphology work, add waveform correlation and landmark timing.

FAQ

What is the right order for a wearable PPG preprocessing pipeline?

A strong default order is timestamp QC, sample QC, optional resampling, baseline handling, optional notch filtering, bandpass filtering, motion handling, signal-quality checks, beat detection, and beat-level QC.

When should I resample PPG data?

Resample when you need a fixed grid for model input, sensor fusion, or unified feature extraction. Keep native timestamps when beat timing matters and a fixed grid is not required.

Should I normalize PPG before peak detection?

Usually not as a first move. Peak detection benefits more from proper filtering and adaptive thresholds than from early normalization, and early normalization can erase amplitude information you may need later.

How long should my wearable PPG windows be?

Use shorter overlapping windows for responsive pulse-rate updates, medium windows for stable beat summaries, and longer windows for variability metrics like HRV.

Where does signal quality assessment belong in the pipeline?

It belongs before you trust or publish physiological outputs. In practice, SQI should influence whether windows or beats are accepted, rejected, or down-weighted.

Can one preprocessing pipeline work for heart rate, HRV, and morphology?

Not perfectly. You can share the early QC steps, but the filtering, normalization, and segmentation choices usually need to change with the endpoint.

References

Allen J. Photoplethysmography and its application in clinical physiological measurement. Physiological Measurement. 2007. https://doi.org/10.1088/0967-3334/28/3/R01
Elgendi M. On the analysis of fingertip photoplethysmogram signals. Current Cardiology Reviews. 2012. https://doi.org/10.2174/157340312801215782
Orphanidou C, et al. Signal-quality indices for the electrocardiogram and photoplethysmogram: derivation and applications to wireless monitoring. IEEE Journal of Biomedical and Health Informatics. 2015. https://doi.org/10.1109/JBHI.2014.2338351
Charlton PH, et al. Assessing hemodynamics from the photoplethysmogram to gain insights into vascular age. American Journal of Physiology-Heart and Circulatory Physiology. 2022. https://doi.org/10.1152/ajpheart.00392.2021

Frequently Asked Questions

What is the right order for a wearable PPG preprocessing pipeline?: A strong default order is timestamp QC, sample QC, optional resampling, baseline handling, optional notch filtering, bandpass filtering, motion handling, signal-quality checks, beat detection, and beat-level QC.
When should I resample PPG data?: Resample when you need a fixed grid for model input, sensor fusion, or unified feature extraction. Keep native timestamps when beat timing matters and a fixed grid is not required.
Should I normalize PPG before peak detection?: Usually not as a first move. Peak detection benefits more from proper filtering and adaptive thresholds than from early normalization, and early normalization can erase amplitude information you may need later.
How long should my wearable PPG windows be?: Use shorter overlapping windows for responsive pulse-rate updates, medium windows for stable beat summaries, and longer windows for variability metrics like HRV.
Where does signal quality assessment belong in the pipeline?: It belongs before you trust or publish physiological outputs. In practice, SQI should influence whether windows or beats are accepted, rejected, or down-weighted.

← Back to all articles