CNN (Convolutional Neural Networks) for PPG Analysis
Convolutional Neural Networks (CNNs) automatically learn hierarchical feature representations from raw PPG waveforms through stacked convolutional layers with learned filters. For PPG, 1D CNNs applied to beat segments achieve state-of-the-art performance in arrhythmia classification, signal quality assessment, and disease biomarker extraction.
1D CNNs process PPG time series by sliding convolutional kernels across the temporal dimension, learning multi-scale features from short rhythmic patterns to long respiratory modulations. Typical architectures for PPG beat classification use 4–8 convolutional layers with increasing receptive fields (kernel sizes 3–15 samples), batch normalization, ReLU activations, and global average pooling before the classification head.
The Stanford DeepHeart study (Ballinger et al., 2018) applied a 4-layer 1D CNN to 30-second PPG windows from Apple Watch data to predict multiple health conditions simultaneously: sleep apnea (AUC 0.90), hypertension (AUC 0.82), diabetes (AUC 0.80), and AF (AUC 0.97). This demonstrated that a single CNN can learn condition-specific morphological and rhythmic signatures from wrist PPG without explicit feature engineering.
ResNet-style skip connections are particularly effective for deep PPG CNNs, preventing gradient vanishing in networks with 10+ layers. SE-Net (Squeeze-and-Excitation) channel attention mechanisms allow the network to weight informative frequency bands differently, adapting to individual physiological variability. For motion artifact removal, U-Net architectures (originally for image segmentation) have been adapted as encoder-decoder CNNs that reconstruct clean PPG from corrupted inputs, treating artifact removal as a signal-to-signal translation problem.
Frequently Asked Questions
What input representation works best for PPG CNNs?
Raw normalized waveforms perform best for morphology-sensitive tasks (AF, blood pressure). Scalogram (CWT) or spectrogram inputs work better for frequency-sensitive tasks (heart rate during exercise, respiratory rate extraction).
How do you handle class imbalance in PPG CNN training for rare conditions like AF?
AF prevalence in population datasets is typically 1–5%. Effective strategies include focal loss, oversampling with SMOTE, and training on balanced mini-batches. Class-weighted loss functions are simpler but less effective than synthetic minority oversampling.
What is the minimum PPG window length for reliable CNN inference?
For single-beat classification: 2–5 seconds (3–8 beats). For rhythm-based classification (AF detection): 10–30 seconds capturing sufficient IBI variability. Longer windows improve accuracy but increase latency and memory requirements.