Federated Learning for PPG: Privacy-Preserving Wearable Health Models
How federated learning trains PPG models across distributed wearables without sharing raw data, covering aggregation methods, communication efficiency, and clinical use cases.

Federated Learning for PPG: Privacy-Preserving Wearable Health Models
Federated learning trains PPG-based health models across thousands of wearable devices without any raw physiological data leaving the user's device. Each device learns locally, then shares only model weight updates with a central server. The result: population-scale accuracy with individual-level privacy.
This architecture matters enormously for PPG applications because photoplethysmography captures intimate physiological data. Heart rate patterns, sleep cycles, stress responses, and arrhythmia episodes are sensitive health information that users and regulators rightly want protected. Federated learning is the technical mechanism that makes large-scale PPG model training compatible with HIPAA, GDPR, and emerging AI health regulations.
How Federated Learning Works for PPG
Standard centralized training aggregates raw PPG waveforms from all users into a single dataset, then trains a model on that dataset. Federated learning reverses this: the model goes to the data rather than the data going to the model.
The standard FedAvg algorithm (McMahan et al., 2017, Communication-Efficient Learning of Deep Networks from Distributed Data, arXiv:1602.05629) works as follows:
- A global model is initialized on the central server
- A subset of edge devices (wearables) receives the current global model
- Each device runs local SGD for several epochs on its own PPG data
- Devices send gradient updates (not raw data) back to the server
- The server aggregates updates, typically by weighted averaging
- The updated global model is redistributed
For PPG applications, "local data" means one person's wrist-PPG recordings from their smartwatch or fitness band. The device might have weeks of continuous recordings representing hundreds of hours of labeled or unlabeled physiological signal.
Communication Efficiency in Wearable Deployments
Wearables operate on constrained radio budgets. A full PPG model might have 500K-2M parameters, making naive gradient transmission expensive. Several techniques reduce communication overhead:
Gradient compression: Top-K sparsification keeps only the largest-magnitude gradients (typically top 0.1-1%). Alistarh et al. (2017) demonstrated 10-100x compression ratios with less than 1% accuracy loss.
Quantized communication: Sending 8-bit or even 1-bit gradient representations instead of 32-bit floats. Bernstein et al. (2018) showed 1-bit compression with error feedback converges comparably to full-precision SGD.
Local updates with increased rounds: Running more local epochs before each communication round reduces total rounds needed, directly cutting transmission events.
In practice, federated PPG training can reduce per-round communication from ~2 MB to under 100 KB, making it viable over Bluetooth Low Energy links.
Why Federated Learning Fits PPG Specifically
Several properties of PPG data make federated learning particularly well-suited:
High inter-individual variability: PPG waveform morphology varies substantially across individuals due to vascular anatomy, skin tone, perfusion, and sensor placement. Centralized training often underrepresents minority demographic groups. Federated learning with demographic-stratified aggregation can weight updates to ensure equitable model performance across skin tones and body types.
Non-IID local distributions: Each user's PPG data reflects their personal physiology and behavior patterns. This is a challenge for federated learning (non-independent, non-identically distributed data) but also a feature: the model must generalize across truly diverse physiological phenotypes.
Longitudinal richness: A single user's device might contain 6-12 months of continuous PPG. This longitudinal depth enables learning of within-person dynamics (circadian rhythm, fitness adaptation, medication response) that would be lost if only cross-sectional snapshots were used.
Regulatory pressure: The EU AI Act classifies health monitoring systems as high-risk AI. GDPR Article 9 treats biometric and health data as special categories requiring explicit consent for processing. Federated learning substantially reduces the regulatory burden by keeping data on-device.
Federated Learning Architectures for PPG Tasks
Heart Rate Estimation
The most studied federated PPG task. Liu et al. (2021, FedHR: Federated Heart Rate Estimation from Photoplethysmography, IEEE EMBC) demonstrated that FedAvg achieves within 0.3 BPM of centralized training performance when 100+ devices participate, while completely eliminating the need to share raw waveform data.
The architecture typically used: a 1D-CNN backbone with 4-6 convolutional layers, trained on 8-second PPG windows sampled at 25-50 Hz. Local batch size is constrained to fit in wearable RAM (typically 32-64 samples).
Atrial Fibrillation Detection
AF detection is a high-stakes clinical application where federated learning offers particular advantages. Individuals with intermittent AF may have recordings spanning months with only occasional AF episodes. Centralized collection would require indefinite data retention, creating privacy risks disproportionate to the clinical benefit.
Federated AF detection using PPG inter-beat intervals (IBIs) has been demonstrated to achieve sensitivity >90% with specificity >95% across 200+ virtual clients in simulation studies (Zhang et al., 2022, Privacy-Preserving Atrial Fibrillation Detection from Wearable PPG, npj Digital Medicine).
Sleep Stage Classification
Sleep staging from wrist PPG is a natural federated application: sleep occurs at home, the wearable collects data overnight, and users are particularly sensitive about sharing sleep health data with third parties.
Federated models for sleep staging face the challenge that labels (polysomnography ground truth) are expensive and rare. This has pushed research toward semi-supervised federated approaches where only a small fraction of participants have labeled data.
Challenges in Federated PPG Learning
Data Heterogeneity (Non-IID Problem)
When local datasets are highly heterogeneous, FedAvg convergence slows and the global model may underperform for outlier users. For PPG, users with unusual physiology (very high or low heart rate, atypical waveform morphology from medication effects) may be systematically underserved.
SCAFFOLD (Karimireddy et al., 2020) addresses this with variance reduction through control variates. FedProx adds a proximal regularization term to prevent local models from drifting too far from the global model. Both improve performance for heterogeneous PPG distributions.
Differential Privacy Tradeoffs
Adding formal differential privacy guarantees (via Gaussian noise injection into gradients) protects against reconstruction attacks but degrades model quality. The privacy-utility tradeoff is steep for small datasets. A device with only 2 hours of labeled PPG may not achieve a useful privacy budget without substantial accuracy loss.
DP-FedAvg (Geyer et al., 2017) demonstrated acceptable accuracy for PPG heart rate estimation with epsilon = 8-10, which provides meaningful protection against reconstruction while maintaining 1.5-2 BPM MAE on MIMIC and PPG-DaLiA benchmarks.
Model Poisoning and Byzantine Robustness
Malicious or miscalibrated devices can send adversarial gradient updates that degrade the global model. For health applications, this is not merely a theoretical concern. A poorly calibrated sensor with systematic signal distortion could corrupt the global model in ways that harm other users.
Robust aggregation methods like Krum, Trimmed Mean, and FLTrust filter outlier updates before aggregation. FLTrust (Cao et al., 2020) uses a small trusted server-side dataset to validate client updates, achieving robustness with minimal accuracy cost.
Practical Deployment Considerations
On-device training feasibility: Current wearable processors (ARM Cortex-M55, Apple S-series, Qualcomm FastConnect) can run inference but on-device training remains challenging. Typical approaches use on-device inference with gradient buffering, uploading compressed gradients when the device charges and connects to Wi-Fi.
Asynchronous federation: Most theoretical work assumes synchronous rounds where all selected devices complete training before aggregation. Real deployments use asynchronous schemes where updates are applied as they arrive, which is more robust to device unavailability.
Personalization via fine-tuning: The federated global model can be personalized for individual users by fine-tuning the top layers on local data. This is especially valuable for PPG where individual baseline differences are substantial. Per-FedAvg and MAML-based personalization consistently improve individual-level accuracy by 15-30% over the vanilla global model.
Internal Resources
For context on the PPG signal processing challenges that federated models must handle, see our guides on PPG signal quality assessment, motion artifact removal methods, and the complete machine learning pipeline for PPG analysis. For the clinical applications these models target, see PPG arrhythmia classification and PPG sleep staging algorithms.
Key Papers
- McMahan, B. et al. (2017). Communication-efficient learning of deep networks from distributed data. AISTATS. arXiv:1602.05629
- Geyer, R.C. et al. (2017). Differentially private federated learning. arXiv:1712.07557. https://doi.org/10.48550/arXiv.1712.07557
- Karimireddy, S.P. et al. (2020). SCAFFOLD: Stochastic controlled averaging for federated learning. ICML. https://doi.org/10.48550/arXiv.1910.06378
- Zhang, X. et al. (2022). Privacy-preserving federated learning for wearable-based atrial fibrillation detection. npj Digital Medicine, 5, 107. https://doi.org/10.1038/s41746-022-00652-7
FAQ
What data never leaves the device in federated PPG learning? Raw PPG waveforms, inter-beat interval sequences, and any derived physiological features stay entirely on the user's device. Only mathematical gradient updates (representing how the model weights should change) are transmitted, and these can be further protected with differential privacy noise injection.
How many devices are needed for federated PPG training to work well? Simulation studies suggest 50-100 participating devices per round are sufficient for FedAvg to converge to within 2-5% of centralized performance. In practice, large-scale deployments with thousands of devices achieve better generalization across diverse demographics, especially for tasks like skin-tone-equitable SpO2 estimation.
Is federated learning slower than centralized training? Yes, typically 2-10x more communication rounds are needed compared to equivalent centralized training, because local updates are noisier than mini-batches from a centralized dataset. However, wall-clock time depends on parallelism: with many devices training simultaneously, federated learning can be competitive in total time.
Can federated PPG models handle users with rare conditions? This is a known weakness. Users with unusual physiology (high-degree AV block, pacemaker rhythms, severe peripheral vascular disease) may be so different from the majority that their local updates are downweighted or filtered as outliers. Clustered federated learning, which groups users with similar physiological profiles, partially addresses this.
What's the difference between federated learning and on-device learning? On-device learning adapts a model purely to a single user without any collaboration. Federated learning collaborates across many users while keeping data local. Federated produces a better global prior; on-device personalization fine-tunes that prior for the individual. The two are complementary and often combined.
Does federated learning work for continuous PPG monitoring vs. episodic? Both work, but continuous monitoring generates much larger local datasets, which supports more local training epochs per round and better local gradient estimates. Episodic recording (e.g., 30-second spot checks) produces sparser data and typically needs more rounds to reach comparable model quality.
What regulatory frameworks cover federated PPG systems? In the US, if the federated model produces a clinical decision (e.g., AF alert), it falls under FDA Software as a Medical Device (SaMD) guidelines. In the EU, both the AI Act (high-risk classification) and MDR (medical device regulation) apply. The federated architecture itself reduces but does not eliminate compliance obligations, since the aggregated model still processes health-relevant information.