ChatPPG Editorial

rPPG Algorithms Deep Dive

Remote photoplethysmography (rPPG) extracts vital signs from video by detecting the optical signature of blood flow in facial skin. The algorithms tha...

ChatPPG Team

2026-03-24T19:21:26+00:00

7 min read

Remote photoplethysmography (rPPG) extracts vital signs from video by detecting the optical signature of blood flow in facial skin. The algorithms that do this have evolved from hand-crafted signal processing transforms in 2013 to end-to-end deep neural networks trained on thousands of hours of labeled video in 2025. Each generation trades accuracy, robustness to motion, and computational cost differently.

This article breaks down the major rPPG algorithm families, their mathematical foundations, and where each one fits in the deployment landscape — from mobile apps to clinical platforms.

Why rPPG Algorithm Choice Matters

No single rPPG algorithm wins on all metrics. CHROM is fast and explainable but assumes a linear skin reflection model that breaks under non-frontal illumination. DeepPhys generalizes better to motion but requires GPU inference and a large training dataset. A telehealth platform serving diverse populations in variable lighting needs different algorithm choices than an Apple Watch face app in a gym.

Understanding the algorithm landscape helps engineers choose the right tool and helps clinicians understand the limits of what they're reading.

Classical Signal Processing Algorithms

ICA (Independent Component Analysis) — Poh et al. 2010

The earliest influential rPPG work by Poh, McDuff, and Picard decomposed the RGB video signal into independent components using FastICA, then selected the component most correlated with cardiac frequency. It worked in controlled lab conditions but broke down with movement — ICA cannot distinguish between motion artifacts that happen to be temporally independent from the cardiac component.

Accuracy: MAE ~5–10 BPM in controlled settings
Speed: Fast (CPU)
Limitation: Poor motion robustness

CHROM (Chrominance-Based) — de Haan & Jeanne 2013

CHROM is the most widely cited classical rPPG algorithm. It projects the RGB signal into a chrominance space using a fixed linear transform, then combines two chrominance channels with a spectral weighting to cancel specular reflection artifacts.

The key insight: specular reflection (skin glare) is achromatic — it contributes equally to R, G, and B. By projecting onto axes orthogonal to the achromatic direction, CHROM suppresses specular artifacts while preserving the hemoglobin-driven color change.

Projection:

X = 3R − 2G
Y = 1.5R + G − 1.5B
S = X − (std(X)/std(Y)) · Y

Accuracy: MAE 3–5 BPM controlled; 5–9 BPM with motion
Speed: Very fast (CPU, real-time on embedded)
Limitation: Assumes linear skin reflection model; degrades with directional lighting

POS (Plane Orthogonal to Skin) — Wang et al. 2017

POS builds on CHROM by defining a "skin tone vector" from normalized mean RGB values, then projects the AC component of the signal onto a plane orthogonal to this vector. This makes POS adaptive to individual skin tone rather than using fixed coefficients.

In benchmark comparisons on the COHFACE and MAHNOB-HCI datasets, POS consistently outperforms CHROM and ICA, particularly under illumination changes. It remains the strongest classical baseline.

Accuracy: MAE 2–4 BPM controlled; 4–7 BPM with moderate motion
Speed: Very fast (CPU)
Limitation: Still fails under large motion; skin tone estimation can drift

PBV (Signature-Based) — de Haan & Trautmann 2013

PBV (photoplethysmography by plane orthogonal to the blood volume vector) uses a per-subject blood volume vector derived from a short calibration period. It outperforms CHROM when the blood volume vector can be accurately estimated but is less robust when that estimate is noisy.

Deep Learning Algorithms

DeepPhys — Chen & McDuff 2018

DeepPhys was the first end-to-end CNN for rPPG. It uses a two-branch convolutional attention network: one branch processes the normalized video (appearance), one processes frame differences (motion). The network learns to attend to skin pixels and suppress motion artifacts through joint training.

Trained on the COHFACE dataset, DeepPhys outperformed all classical methods in cross-dataset evaluation — but it also introduced the generalization problem: models trained on one demographic or lighting condition do not reliably transfer.

Accuracy: MAE 1.5–3 BPM on in-distribution data
Speed: GPU required for real-time; heavy for mobile
Limitation: Poor cross-dataset generalization without fine-tuning

PhysNet — Yu et al. 2019

PhysNet used a 3D CNN (C3D architecture) operating on spatiotemporal video volumes, allowing it to capture temporal dynamics over longer windows (128–300 frames). It demonstrated better HRV estimation than frame-by-frame approaches and showed that temporal context is critical for accurate IBI extraction.

EfficientPhys — Liu et al. 2023

EfficientPhys addressed the efficiency gap, targeting mobile and edge deployment. It uses temporal shift modules (TSM) with lightweight 2D convolutions instead of full 3D CNNs, achieving near-PhysNet accuracy at 10x less compute. On a Snapdragon 888 mobile SoC, EfficientPhys runs at 30 fps in real time — the first deep learning rPPG model deployable on a smartphone without dedicated hardware acceleration.

Transformer-Based Models (2022–2025)

Several groups have applied Vision Transformers (ViT) and video transformers to rPPG:

PhysFormer (Yu et al. 2022): Temporal difference transformer with query-key attention over facial patches. Outperformed CNNs on UBFC-rPPG and VIPL-HR datasets.
EfficientPhys-T: Hybrid CNN-Transformer adding global temporal self-attention. Best published accuracy as of 2024 on the MR-NIRP dataset.

The consensus from 2024 benchmark papers: transformer models outperform pure CNNs on heterogeneous datasets (mixed skin tones, environments, camera types) but add 30–50% inference overhead. For deployment, the trade-off depends on whether a GPU is available.

Domain Adaptation and Skin Tone Fairness

Every rPPG benchmark shows accuracy drops for Fitzpatrick skin types V–VI. The physics is straightforward: darker skin absorbs more green light, reducing the pulsatile signal amplitude relative to noise. Solutions being pursued in 2025:

Near-infrared rPPG — NIR cameras (850 nm) show less melanin-related signal attenuation. Several papers report near-parity accuracy across skin tones with NIR, though standard webcams don't capture NIR.

Domain adversarial training — Train a feature extractor that is explicitly penalized for encoding skin tone information, forcing it to use only cardiac-related features.

Multi-illuminant calibration — Per-frame illuminant estimation to normalize color space before rPPG extraction.

None of these fully solves the problem at the performance levels needed for FDA-cleared diagnostics. It remains the primary fairness challenge in clinical rPPG deployment.

Benchmark Datasets

Dataset	Subjects	Conditions	Ground Truth	Use
UBFC-rPPG	42	Controlled light	Finger PPG	Standard benchmark
COHFACE	40	2 conditions	Contact PPG	Motion variability
MAHNOB-HCI	27	Emotional stimuli	ECG + NIRS	HRV studies
VIPL-HR	107	9 scenarios	Contact PPG	Diverse conditions
MR-NIRP	8	NIR + RGB	ECG	Skin tone analysis

A persistent problem: most public datasets are small and demographically homogeneous. The VIPL-HR dataset at 107 subjects is considered large by rPPG standards. Compare this to ImageNet's 1.2 million images. This data scarcity explains why rPPG models overfit to training distributions and why real-world performance often disappoints.

Choosing an Algorithm: Decision Guide

Use CHROM or POS when:

Real-time CPU-only operation is required
Lighting is reasonably controlled
You need explainability or regulatory transparency

Use DeepPhys/PhysNet when:

GPU is available
Training data matches deployment distribution
Accuracy is paramount over speed

Use EfficientPhys when:

Mobile/edge deployment with limited compute
Real-time requirement at 30 fps
Reasonable demographic diversity in training data

Use Transformer models when:

Highest accuracy across diverse conditions is priority
Server-side inference acceptable
Dataset is sufficiently large and diverse

FAQ

What is the most accurate rPPG algorithm? As of 2025, transformer-based models like PhysFormer and EfficientPhys-T achieve the lowest error on diverse benchmarks (MAE 1.5–2 BPM). However, accuracy is highly dataset-dependent, and no algorithm is definitively best across all conditions.

Can rPPG algorithms run on a smartphone in real time? Yes. EfficientPhys runs at 30 fps on modern mid-range smartphones using the CPU/GPU. Classical methods like CHROM and POS run in real time even on low-end hardware.

Why do rPPG algorithms fail with head movement? Head movement creates motion artifacts in the pixel values that overlap in frequency with the cardiac signal (0.5–4 Hz). Classical algorithms cannot reliably separate these; deep learning models trained with motion augmentation handle moderate movement better.

What training data do rPPG deep learning models need? High-quality rPPG training requires synchronized facial video and ground-truth contact PPG (or ECG) signals. Models generally need hundreds of hours of diverse video across lighting conditions, skin tones, and motion levels for robust real-world performance.

How does CHROM work differently from POS? CHROM uses fixed chrominance coefficients to project the RGB signal into a specular-reflection-free space. POS adapts those coefficients to individual skin tone by estimating the skin tone vector from the actual video signal, making it more robust to skin tone variation.

What is the best rPPG algorithm for dark skin tones? Near-infrared rPPG shows the most promise for equitable performance across skin tones. Among visible-spectrum algorithms, domain-adversarial deep learning methods with diverse training data show the smallest accuracy gap, though all current methods degrade at Fitzpatrick V–VI.

Can rPPG measure HRV accurately? PhysNet and transformer models achieve HRV metrics (RMSSD, SDNN) with useful accuracy under controlled conditions — sufficient for population-level trend analysis but below clinical Holter monitor standards for individual diagnostic use.

References

de Haan G, Jeanne V. (2013). "Robust pulse rate from chrominance-based rPPG." IEEE Transactions on Biomedical Engineering, 60(10), 2878–2886. DOI: 10.1109/TBME.2013.2266196
Wang W, et al. (2017). "Algorithmic principles of remote-PPG." IEEE Transactions on Biomedical Engineering, 64(7), 1479–1491. DOI: 10.1109/TBME.2016.2609282
Chen W, McDuff D. (2018). "DeepPhys: Video-based physiological measurement using convolutional attention networks." ECCV 2018. DOI: 10.1007/978-3-030-01216-8_22

Frequently Asked Questions

What is the most accurate rPPG algorithm?: As of 2025, transformer-based models like PhysFormer and EfficientPhys-T achieve the lowest error on diverse benchmarks (MAE 1.5–2 BPM). However, accuracy is highly dataset-dependent, and no algorithm is definitively best across all conditions.
Can rPPG algorithms run on a smartphone in real time?: Yes. EfficientPhys runs at 30 fps on modern mid-range smartphones using the CPU/GPU. Classical methods like CHROM and POS run in real time even on low-end hardware.
How does CHROM work differently from POS?: CHROM uses fixed chrominance coefficients to project the RGB signal into a specular-reflection-free space. POS adapts those coefficients to individual skin tone by estimating the skin tone vector from the actual video signal, making it more robust to skin tone variation.

← Back to all articles