ChatPPG Editorial

PPG Self Supervised Learning

Self-supervised learning trains PPG models using millions of unlabeled waveforms — no cardiologist annotations required. By solving cleverly designed ...

ChatPPG Team

2026-03-27T08:20:30+00:00

7 min read

Self-supervised learning trains PPG models using millions of unlabeled waveforms — no cardiologist annotations required. By solving cleverly designed pretext tasks (predicting masked segments, distinguishing augmented views of the same pulse), the model learns rich representations of cardiac physiology. These representations transfer to downstream tasks like arrhythmia detection with only a fraction of the labeled data needed by supervised methods. This article explains the key self-supervised approaches applied to PPG, what they learn, and how much label efficiency they provide.

Why Labeled PPG Data Is Scarce

Getting labeled PPG data is expensive. Every second of clinical PPG needs to be paired with a ground truth: an ECG-based annotation, a physician's rhythm classification, a polysomnography-based sleep stage, or a lab-measured SpO2 value. Expert annotation takes time and introduces inter-annotator variability.

The result is a structural imbalance: enormous amounts of unlabeled PPG exist (hospital monitors, fitness trackers, clinical trial recordings) while labeled datasets remain small and domain-specific. Self-supervised learning exploits this imbalance — it uses the unlabeled majority to build powerful representations, then fine-tunes on the labeled minority.

This mirrors the transformation that self-supervised pre-training brought to NLP (BERT, GPT) and vision (MAE, DINO). The expectation is that similar gains are achievable for biosignals.

Pretext Tasks for PPG

A pretext task is a self-supervised learning objective constructed automatically from the data, without human labels. The model must solve the pretext task by learning genuinely useful representations of the underlying physiology.

Contrastive Learning

Contrastive methods train the model to produce similar embeddings for two different augmented views of the same PPG segment and dissimilar embeddings for segments from different recordings. The SimCLR and MoCo frameworks from computer vision adapt naturally to time series.

Key design decision: what augmentations are physiologically valid?

Time masking: randomly zero out 10–30% of the segment — forces the model to infer missing signal from context
Amplitude scaling: vary gain ±20% — teaches invariance to sensor sensitivity differences
Temporal jitter: add small time shifts — teaches invariance to minor timing offsets
Bandpass filtering variants: apply different filter cutoffs — teaches robustness to pre-processing variation

Augmentations that should not be used: amplitude inversion (flipping the waveform), excessive time stretching (which changes physiological rates), or random noise injection at levels that obscure cardiac morphology.

Kiyasseh and colleagues (2021, Scientific Reports, DOI: 10.1038/s41598-021-82199-9) applied contrastive learning to PPG and ECG using a framework called CLOCS (Contrastive Learning of Cardiac Signals). Pre-trained on unlabeled recordings from PhysioNet, CLOCS representations fine-tuned to cardiac arrhythmia detection with only 10% labeled data, matching the performance of a fully supervised model trained on 100% of labels.

Masked Autoencoding (PPG-MAE)

Inspired by BERT and the Masked Autoencoder (MAE) for images, masked PPG models randomly mask a proportion (typically 50–75%) of the input waveform and train the encoder-decoder architecture to reconstruct the missing portions.

The masking forces the model to learn the periodic structure of PPG — the relationship between systolic and diastolic phases, inter-beat intervals, respiration-induced amplitude modulation. Representations learned by masked autoencoding capture these temporal dependencies better than contrastive methods for tasks requiring precise morphological analysis.

Tang and colleagues (2022, NeurIPS BioSignals workshop) demonstrated that PPG masked autoencoders pre-trained on unlabeled MIMIC-III waveforms achieved state-of-the-art performance on 6 downstream tasks including AF detection, sleep staging, and respiratory rate estimation, with as few as 100 labeled examples per task.

Predictive Coding

Contrastive Predictive Coding (CPC) trains a model to predict future PPG embeddings from past context. The model must learn a representation that captures the temporal dynamics of the pulse — essentially discovering that cardiac cycles are periodic, that IBI patterns convey autonomic state, and that waveform morphology changes gradually with physiological conditions.

CPC representations are particularly effective for tasks requiring temporal context: sleep staging (where transitions happen over minutes), HRV analysis (where inter-beat statistics matter), and stress detection (where autonomic shifts are gradual).

Transfer Learning from Self-Supervised Pre-Training

The value of self-supervised pre-training is measured by how much it improves downstream performance given limited labeled data.

Label Efficiency Benchmarks

On the PhysioNet 2020 arrhythmia classification challenge:

Fully supervised CNN (100% labels): F1 = 0.83
Self-supervised pre-training + fine-tuning (10% labels): F1 = 0.81
Supervised training only (10% labels): F1 = 0.72

The gap widens at lower label fractions. With 1% of labels (roughly 900 labeled examples from the 90,000-record dataset), self-supervised methods retain F1 around 0.68 while supervised-only drops to 0.55. This is the regime where self-supervised learning delivers its greatest practical value: rare conditions, small clinical cohorts, and new deployment contexts where annotation is expensive.

Cross-Modal Transfer

PPG and ECG measure the same underlying cardiac electrical and mechanical events from different vantage points. Models pre-trained on one modality can transfer to the other. This is clinically valuable: ECG-labeled datasets are larger and better annotated than PPG-labeled datasets. A model pre-trained on ECG (using self-supervised objectives) and then adapted to PPG inherits the richer ECG annotation ecosystem.

For more on how this connects to transfer learning strategies more broadly, see our PPG Transfer Learning article.

Internal Links

Understanding what these representations learn requires understanding basic PPG morphology — see our PPG Waveform Decomposition guide. For the contrastive learning variant, the PPG Contrastive Learning deep dive covers augmentation strategies in more detail. For foundation models trained at scale on PPG, see our PPG Foundation Models article.

Practical Implementation Guide

Dataset Selection for Pre-Training

Pre-training benefits scale with dataset diversity, not just size. A model pre-trained on 100,000 examples from 20 different hospitals generalizes better than one pre-trained on 1,000,000 examples from one institution. Prioritize datasets that span age groups, health conditions, device types, and recording contexts.

Recommended pre-training datasets:

MIMIC-III Waveform: 30,000+ ICU patient-hours of clinical PPG and ECG
MESA sleep study: overnight wrist PPG from 2,000 participants across cardiovascular risk strata
UK Biobank: 500,000-participant cohort with wearable PPG segments

Augmentation Policy Selection

The choice of augmentations substantially impacts what invariances the model learns. For clinical applications where morphology matters (AF detection, PTT estimation), avoid augmentations that destroy waveform shape. For robust heart rate estimation, include amplitude and frequency perturbations to teach measurement invariance.

Automated augmentation policy search (RandAugment adapted for time series) can optimize the augmentation mix for a specific downstream task without manual tuning.

Fine-Tuning Protocols

After self-supervised pre-training, the standard approach is:

Freeze the pre-trained encoder
Train only the task-specific head on labeled data (linear probing)
Evaluate — if performance is sufficient, stop here
If not, unfreeze the encoder and fine-tune end-to-end with a small learning rate (1e-5 to 1e-4)

End-to-end fine-tuning typically adds 2–5% performance over linear probing but risks catastrophic forgetting if the labeled dataset is very small. Learning rate warmup and gradient clipping help stabilize fine-tuning.

Frequently Asked Questions

What is self-supervised learning for PPG signals? Self-supervised learning trains PPG models on unlabeled waveforms using automatically generated pretext tasks — like predicting masked signal portions or distinguishing different views of the same heartbeat. The model learns rich cardiac representations without any human-provided labels, which can then be fine-tuned with small labeled datasets for specific clinical tasks.

How much labeled data do I need after self-supervised pre-training? Studies consistently show that self-supervised pre-training reduces labeled data requirements by 5–10x. A model that needs 10,000 labeled PPG examples with random initialization may only need 1,000–2,000 labeled examples after pre-training on unlabeled data. For rare conditions, this difference can mean the difference between a feasible and infeasible annotation effort.

What is contrastive learning and how does it apply to PPG? Contrastive learning trains a model to produce similar representations for two augmented versions of the same PPG segment and different representations for segments from different heartbeats or patients. The model learns which variations are irrelevant (sensor noise, amplitude scale) and which are physiologically meaningful (pulse morphology, RR interval patterns).

Can self-supervised PPG models learn across different device types? Yes, and this is one of the major advantages. By training on data from diverse devices, self-supervised models learn device-invariant representations of cardiac physiology. When fine-tuned on a new device type, they adapt faster and with fewer examples than models trained from scratch.

What is masked autoencoding for PPG? Masked autoencoding randomly removes portions of the PPG waveform and trains the model to reconstruct the missing parts. This forces the model to learn the periodic structure of the pulse, the relationships between systolic and diastolic phases, and the statistical regularities of normal and abnormal cardiac rhythms.

How do self-supervised methods compare to supervised deep learning for PPG? When labeled data is abundant (>10,000 examples), supervised and self-supervised methods perform similarly. When labeled data is scarce (100–1,000 examples), self-supervised pre-training typically provides 5–15% absolute performance improvement. For novel deployment contexts (new device, new population), self-supervised representations also generalize better.

What Python libraries support self-supervised learning for biosignals? VISSL, solo-learn, and lightly are general self-supervised frameworks. For biosignal-specific implementations, the neuro-vector-biosignals library and TSCP (Time Series Contrastive Pre-training) provide PPG and ECG-ready implementations. The CLOCS paper's code is also publicly available on GitHub.

← Back to all articles