ChatPPG Editorial

PPG Continual Learning

A PPG model deployed to millions of wearables in 2024 faces a problem in 2026: the patient population has aged, devices have been updated with new sen...

ChatPPG Team

2026-03-27T08:20:30+00:00

7 min read

A PPG model deployed to millions of wearables in 2024 faces a problem in 2026: the patient population has aged, devices have been updated with new sensor firmware, and new cardiac conditions have been identified that were not in the original training set. Retraining from scratch wastes resources and risks losing well-learned performance on original tasks. Continual learning solves this — it allows PPG models to incorporate new data and new tasks while preserving accuracy on everything they already know. This article explains how continual learning works for cardiac signal models, which methods are most effective, and how to implement them in practice.

The Catastrophic Forgetting Problem

When you fine-tune a neural network on new data, it tends to catastrophically forget what it learned previously. This is not a minor degradation — retraining a PPG arrhythmia model on new data from a hospital population can reduce performance on the original training distribution by 30–50%.

The mechanism: gradient descent updates weights to minimize loss on the new data. Weights that were critical for prior tasks get overwritten. The network has no mechanism to protect previously learned features unless explicitly designed for continual learning.

For PPG applications, catastrophic forgetting creates real deployment problems:

Firmware updates change sensor sampling characteristics, requiring model adaptation
New disease populations (post-COVID arrhythmias, aging demographics) need model expansion
Regulatory requirements for AI/ML SaMD demand that model updates be validated — but the original validation data may not be available for combined retraining
Privacy constraints prevent re-accessing earlier training datasets when updating models

Taxonomy of Continual Learning Approaches

Regularization-Based Methods

Regularization methods add penalty terms to the loss function that protect weights critical for prior tasks. They do not require storing old data — the prior knowledge is encoded in the weight importance estimates.

Elastic Weight Consolidation (EWC) (Kirkpatrick et al., 2017, PNAS, DOI: 10.1073/pnas.1611835114): Estimates the importance of each parameter for prior tasks using the diagonal of the Fisher information matrix. When learning new tasks, highly important weights are penalized for changing, while less important weights can freely adapt.

For PPG: after training on arrhythmia classification, EWC can estimate that early convolutional layer weights (detecting peaks and slopes) are highly important and must not change much, while top-layer classifier weights are less constrained and can adapt to new classes.

Synaptic Intelligence (SI): Tracks the contribution of each weight to the loss reduction during training (path integral approach). More computationally efficient than EWC's Fisher approximation but slightly less accurate in importance estimation.

Memory Aware Synapses (MAS): Estimates weight importance based on the sensitivity of the model's output to weight changes, without requiring class labels. Suitable for unsupervised continual learning of PPG representations.

Replay-Based Methods

Replay methods maintain a small buffer of previous training examples and mix them into each new training batch. This prevents forgetting by ensuring the model continues seeing representative examples of prior tasks.

Experience Replay (ER): Store a subset of past PPG examples and replay them when learning new data. The buffer size is the key parameter — too small means inadequate coverage of the prior distribution; too large consumes memory.

Buffer selection strategies for PPG:

Random: Store a random subset of prior examples
Reservoir sampling: Maintains a uniform random sample over the full prior stream
Herding: Select examples that are most representative of each class distribution (nearest to class centroid in feature space)

For PPG, herding-based selection tends to outperform random selection because PPG segments have high redundancy — most segments within a class are similar. Representative examples capture more variability in fewer stored samples.

Generative Replay: Instead of storing real PPG examples, train a generative model (VAE or GAN) to synthesize prior-task examples on demand. Eliminates the data storage constraint at the cost of generative model quality. For rare PPG arrhythmia types, synthetic replay from a class-conditional VAE can maintain performance without retaining actual patient data — a significant privacy advantage.

Architecture-Based Methods

Architecture methods allocate different parts of the network to different tasks, preventing interference between tasks at the weight level.

Progressive Neural Networks (Rusu et al., 2016): Freeze learned columns and instantiate new columns for new tasks. Lateral connections allow the new column to leverage prior representations. No forgetting by design — prior columns are frozen. Limitation: network grows with each new task; not suitable for long continual learning sequences.

PackNet: Iteratively prune unimportant weights after each task and use the pruned weights for new tasks. Fixed network size with expanding task coverage. Works well for 3–5 PPG tasks but saturates as the number of tasks grows.

Dynamic Architectures: Expand the network size by adding neurons or layers when new tasks exceed the current capacity, while protecting prior task weights. Recent approaches like DEN (Dynamically Expandable Networks) combine expansion with selective retraining.

PPG-Specific Continual Learning Challenges

Non-Stationary Signal Statistics

PPG signal statistics drift over time at multiple timescales:

Within-session: motion artifacts, posture changes, temperature variation
Day-to-day: diurnal autonomic variation, physical activity differences
Weeks-to-months: seasonal changes, medication effects, disease progression
Years: aging, device wear changes, software updates

A continual learning system for wearable PPG must distinguish genuine distribution shift requiring model update from normal physiological variation within the expected distribution.

Online anomaly detection on input statistics (tracking mean, variance, spectral features of incoming PPG batches) can identify when distribution shift has occurred. Significant shift triggers a model update cycle; normal variation does not.

Task-Incremental vs. Domain-Incremental Learning

For PPG, two continual learning scenarios are common:

Task-incremental: New clinical tasks are added over time (start with heart rate estimation, add AF detection, add sleep staging). Each task has its own output head. The challenge is preserving earlier task performance while learning new ones.

Domain-incremental: The same task is applied to shifting domains (same AF detection task, but new device types, new patient populations, new hospital sites). The output space is fixed; the input distribution changes. This is arguably the more common real-world scenario for wearable PPG.

Domain-incremental continual learning is harder — without explicit task labels at test time, the model must generalize across all seen domains simultaneously.

Practical Implementation

Drift Detection for Triggering Updates

Don't update continually unless necessary. Implement statistical process control on incoming PPG batch statistics:

Track the running mean and variance of PPG amplitude, heart rate estimates, and signal quality index
Use CUSUM (cumulative sum control chart) or Page-Hinkley test to detect statistically significant shifts
Trigger model update only when drift exceeds a threshold

This prevents wasteful computation and model updates on normal physiological variation.

Memory Buffer Management

For experience replay, a replay buffer of 100–500 PPG segments per class typically suffices for 5–10 class problems. Use reservoir sampling to maintain a running uniform sample as new data arrives.

Storage estimates: at 100 Hz, 30-second PPG segments = 3,000 samples × 2 bytes (INT16) = 6 KB per segment. A 500-segment buffer per class, 5 classes = 15 MB — manageable even on embedded systems with external flash storage.

Evaluation Protocol for Continual PPG Learning

Use the backward transfer and forward transfer metrics:

Backward transfer: How much did learning task B degrade performance on task A?
Forward transfer: How much did pre-training on task A improve initial performance on task B?

These metrics provide richer insight than simple per-task accuracy and directly quantify the catastrophic forgetting and knowledge transfer phenomena.

Internal Links

For the base model architectures used in continual PPG systems, see PPG Convolutional Neural Networks. For the domain adaptation challenges that continual learning addresses, see PPG Transfer Learning. For federated deployment where continual learning is applied across distributed sites, see Federated Learning for PPG.

Frequently Asked Questions

What is continual learning for PPG models? Continual learning allows PPG models to incorporate new training data (new devices, new patient populations, new cardiac conditions) without catastrophically forgetting performance on previously learned tasks. It is the machine learning equivalent of a cardiologist who keeps learning while retaining expertise.

What is catastrophic forgetting in cardiac AI? Catastrophic forgetting occurs when fine-tuning a neural network on new data causes it to lose performance on old data. For a PPG arrhythmia model, this might mean that retraining on data from a new hospital population causes performance to degrade significantly on the original population. Continual learning methods prevent this.

What is Elastic Weight Consolidation (EWC) and how does it work? EWC adds a regularization penalty to the training loss that resists changing weights that were important for prior tasks. Importance is estimated using the Fisher information matrix — weights that have high influence on prior task predictions are penalized more for changing. This protects prior knowledge while allowing less-important weights to adapt to new tasks.

Can continual learning work without storing old data? Yes — regularization methods like EWC and SI encode prior task knowledge in weight importance estimates, requiring no stored data. This is important for PPG applications where retaining old patient data would violate privacy regulations. Generative replay, which stores a generative model instead of real data, is another privacy-preserving approach.

How often should a deployed PPG model be updated? Update frequency should be driven by detected distribution shift, not time. Implement statistical monitoring of incoming PPG batch statistics and trigger updates only when significant drift is detected. For stable clinical deployments, this might mean quarterly updates; for consumer wearables with diverse use patterns, monthly updates may be appropriate.

What datasets are available for benchmarking PPG continual learning? The PPG domain lacks a dedicated continual learning benchmark. Researchers typically construct custom benchmarks by partitioning existing datasets (MIMIC-III, PhysioNet challenges) by hospital, device type, or patient age group to simulate domain-incremental scenarios. The ContinualBench framework for biosignals (emerging in 2024–2025 literature) provides standardized evaluation protocols.

← Back to all articles