ChatPPG Editorial

Facial PPG Signal Extraction

Extracting a photoplethysmographic signal from the human face is a problem in optics, anatomy, and signal processing simultaneously. The face is rich ...

ChatPPG Team

2026-03-24T19:21:26+00:00

7 min read

Extracting a photoplethysmographic signal from the human face is a problem in optics, anatomy, and signal processing simultaneously. The face is rich in superficial vasculature — the supratrochlear artery crosses the forehead, the facial artery branches over the cheeks, and the nasal dorsal artery runs along the nose — making it the best-studied region for remote PPG (rPPG) acquisition. But the face also presents variable illumination, constant micro-expressions, hair, shadows, and specular reflection from glasses and oily skin.

This article examines the optical basis of facial PPG, what the research says about optimal region-of-interest (ROI) selection, and how algorithm design choices affect signal fidelity in different facial regions.

Skin Optics: Why the Face Works for rPPG

The blood pulsation signal in rPPG comes from the dynamic interaction of light with chromophores in superficial skin tissue. Two chromophores dominate:

Oxyhemoglobin (HbO2) and deoxyhemoglobin (Hb): These iron-containing proteins in red blood cells have distinct absorption spectra. The molar extinction of HbO2 peaks around 415 nm (Soret band) with secondary peaks at 542 and 577 nm. The green wavelength range (520–570 nm) sits in a region where hemoglobin absorption is high and where the pulsatile component (AC) is a meaningful fraction of the total signal (DC).

Melanin: A broadband absorber from UV through NIR. Higher melanin concentrations in darker skin reduce the fraction of green light reaching the deeper dermis where vasculature resides, and reduce the reflected signal that returns to the camera. This is why rPPG signal-to-noise ratio decreases with Fitzpatrick skin type.

The optical penetration depth at 530 nm (green) in Fitzpatrick I–II skin is approximately 0.5–1 mm — sufficient to reach the superficial dermis where the subpapillary plexus lies. The AC/DC ratio in the green channel (the normalized pulsatile amplitude) is typically 0.2–2% — small, but detectable with modern cameras.

Why Green Channel?

The green channel is preferred for rPPG signal extraction for three reasons:

Hemoglobin absorption: HbO2 absorbs most strongly in the green range relative to background tissue absorption, maximizing pulsatile signal amplitude.
Camera sensitivity: CMOS sensors used in consumer cameras have peak quantum efficiency in the green channel. Bayer filter arrays allocate two green pixels per four-pixel group (vs. one red and one blue), further boosting green SNR.
Melanin separation: Melanin absorbs more strongly at shorter wavelengths (UV, blue, green) but the pulsatile hemoglobin signal in green still dominates the AC component even in darker skin, because melanin absorption is relatively constant (DC) rather than pulsatile.

Near-infrared (NIR) rPPG at 850 nm shows lower melanin interference and has been proposed as more equitable across skin tones, with the trade-off of requiring dedicated NIR illumination that consumer cameras lack.

Facial ROI Selection: Where to Measure

Not all facial regions are equal for rPPG. Signal quality depends on vascularity, tissue depth, motion susceptibility, and distance from facial muscles that cause motion artifacts.

Forehead

The forehead is the most studied and generally best-performing ROI for rPPG:

Pros:

Dense superficial vascular network (supratrochlear and supraorbital arteries)
Relatively flat surface geometry reduces specular reflection variation
Minimal fat tissue means superficial vasculature is closer to skin surface
Less affected by mastication or speech than cheeks or chin

Cons:

Hair occlusion in subjects with low hairlines or bangs
Horizontal wrinkles create intensity gradients that move with expression
Perspiration can alter optical properties during exercise

Typical AC/DC ratio: 0.3–1.5% in controlled conditions, Fitzpatrick I–IV

Cheeks

Cheeks carry the facial artery and its branches. The cheek signal amplitude is often comparable to the forehead, but cheeks are more motion-prone:

Pros:

Well-vascularized (facial artery, transverse facial artery)
Good skin surface area for spatial averaging

Cons:

Facial muscle movement during speech, chewing, and expression creates large motion artifacts
Cheek tissue has more fat content (greater optical path length variability)
More likely to be occluded by beard or facial hair

Typical AC/DC ratio: 0.2–1.2%

Nose Tip (Nasal Alae)

The nasal tip and alae contain the external nasal artery and are particularly well-perfused:

Pros:

High vascularity relative to skin area
Less muscle-induced motion than cheeks
Useful when forehead is occluded

Cons:

Small area (limited spatial averaging)
Sensitive to nose wrinkling (expression artifact)
More specular reflection on curved nasal surfaces

Typical AC/DC ratio: 0.5–2.0% (often highest of facial regions in lightly pigmented skin)

Periorbital (Under-Eye) Region

An underexplored region with promising vascularity:

Pros:

Angular vein and infraorbital artery contribute to pulsatile signal
Less affected by beard or makeup than other regions

Cons:

Prone to capillary dilution with fatigue (periorbital dark circles)
Eye blinking creates major motion and illumination transients in adjacent pixels

Multi-ROI Fusion

Most state-of-the-art rPPG systems use multiple ROIs simultaneously. The signals from forehead, left cheek, right cheek, and nose are extracted independently, quality-assessed per epoch (using SQI metrics), and fused with quality-weighted averaging. Multi-ROI fusion consistently outperforms single-ROI extraction in benchmark studies by 15–30% in MAE, primarily by providing coverage when one region is temporarily occluded or motion-corrupted.

Algorithm Design for Facial PPG

Skin Color Segmentation

Before extracting the PPG signal, algorithms must identify skin pixels and exclude non-skin regions (hair, background, glasses). Methods include:

Elliptical skin color model in YCbCr space — fast, lighting-independent, good false rejection of hair and clothing
Deep learning face segmentation (DeepLab, Mediapipe) — higher accuracy, particularly for unusual lighting or non-frontal poses, at the cost of compute

Temporal Normalization

The raw pixel mean per frame drifts with ambient light changes and gross motion. Per-frame temporal normalization (dividing each frame's ROI mean by a short-window rolling average) detrends the DC component and amplifies the AC pulsatile signal before bandpass filtering.

Spatial Weight Maps

Not all pixels in an ROI contribute equally. Pixels with higher AC/DC ratio (higher vascular density) should receive higher weight. Some algorithms learn spatial weight maps from training data; others estimate them per subject from the initial resting signal.

A 2020 paper by Pilz et al. demonstrated that spatial attention maps learned end-to-end improved rPPG accuracy by 22% on the COHFACE dataset compared to uniform ROI averaging, highlighting the value of learnable spatial weighting.

The Skin Tone Fairness Problem in Facial rPPG

Fitzpatrick V–VI individuals consistently show rPPG accuracy degradation of 1.5–3× compared to Fitzpatrick I–II in validation studies using the same algorithm and conditions. This is not algorithmic bias per se — it reflects a real optical physics constraint (lower AC/DC ratio). But the bias becomes algorithmic when:

Training datasets are demographically skewed toward lighter skin tones
Quality thresholds for data acceptance exclude valid darker-skin data as "low SNR"
Alert thresholds calibrated on lighter-skin validation data are applied unchanged to darker-skin populations

Mitigation strategies under active development:

Per-subject calibration: A 30-second reference segment with contact PPG calibrates the rPPG gain for the individual's skin tone. Practical in clinical settings; impractical for consumer video.

NIR illumination: Adding supplemental 850 nm illumination and using a NIR-sensitive camera removes the melanin disadvantage in the visible spectrum.

Adversarial training for skin-tone invariance: Penalizing the feature extractor for encoding skin tone classification signals forces the network to rely on cardiac-pulsatile features only.

FAQ

Which facial region gives the strongest rPPG signal? The forehead and nasal tip generally provide the strongest rPPG signal due to dense superficial vasculature and relative motion stability. Nasal alae often show the highest AC/DC ratio in lightly pigmented skin. Multi-ROI fusion (forehead + cheeks + nose) outperforms any single region.

Why is the green channel used for facial rPPG? The green channel (520–570 nm) coincides with high hemoglobin optical absorption, CMOS camera peak sensitivity, and a favorable AC/DC signal ratio. Red and blue channels carry weaker pulsatile signals and more noise for most skin types.

How does skin tone affect facial PPG signal quality? Higher melanin concentration (Fitzpatrick types V–VI) increases baseline light absorption, reducing the fraction of green light that penetrates to the superficial vasculature and returns to the camera. This lowers the AC/DC ratio and reduces rPPG signal-to-noise ratio by 30–60% versus lighter skin tones.

What size should the facial ROI be for rPPG? Larger ROIs improve spatial averaging (more pixels = better SNR) but include more motion-contaminated regions. Practical ROI sizes range from 20×20 to 100×100 pixels at 720p. Foreheads typically offer 60–200 pixels wide depending on distance and camera resolution.

Can facial PPG work with glasses? Metal-framed glasses that occlude parts of the forehead reduce the usable ROI area. Glasses themselves reflect specular light that can create bright, motion-sensitive pixels near the ROI boundary. Algorithms with adaptive ROI masking handle glasses better than fixed-rectangle ROIs.

Does makeup affect facial rPPG signal? Foundation makeup can alter the optical absorption properties of the skin surface, reducing AC/DC ratio by changing the effective optical path. Heavy powder foundation has a larger effect than liquid foundation. This is underexplored in the rPPG literature, representing a gap given cosmetics use in the target demographic.

References

Pilz CS, et al. (2020). "Spatial subspace rotation for robust rPPG." IEEE Transactions on Image Processing, 29, 4902–4914. DOI: 10.1109/TIP.2020.2973373
Verkruysse W, Svaasand LO, Nelson JS. (2008). "Remote plethysmographic imaging using ambient light." Optics Express, 16(26), 21434–21445. DOI: 10.1364/OE.16.021434
McDuff D, et al. (2015). "Improvements in remote cardiopulmonary measurement using a five band digital camera." IEEE Transactions on Biomedical Engineering, 61(10), 2593–2601. DOI: 10.1109/TBME.2014.2323695

Explore more: rPPG algorithms deep dive, skin tone bias in PPG, contactless vital signs, PPG tissue optics

Frequently Asked Questions

Which facial region gives the strongest rPPG signal?: The forehead and nasal tip generally provide the strongest rPPG signal due to dense superficial vasculature and relative motion stability. Multi-ROI fusion (forehead + cheeks + nose) outperforms any single region.
Why is the green channel used for facial rPPG?: The green channel (520–570 nm) coincides with high hemoglobin optical absorption, CMOS camera peak sensitivity, and a favorable AC/DC signal ratio. Red and blue channels carry weaker pulsatile signals and more noise for most skin types.
Does makeup affect facial rPPG signal?: Foundation makeup can alter the optical absorption properties of the skin surface, reducing AC/DC ratio by changing the effective optical path. Heavy powder foundation has a larger effect than liquid foundation. This is underexplored in the rPPG literature.

← Back to all articles