ML Classifiers for Cardiac Arrhythmia Detection from PPG: Algorithms, Accuracy & Clinical Evidence
Cardiac arrhythmia detection from photoplethysmography has progressed from a research curiosity to a clinically validated screening capability, with the FDA clearing multiple PPG-based atrial fibrillation detection algorithms for consumer wearable devices since 2018. The fundamental premise is that arrhythmias disrupt the regularity, amplitude, and morphology of the peripheral pulse wave, and these disruptions are detectable by machine learning classifiers applied to PPG signals. However, the distinction between different arrhythmia types, the management of motion artifacts, and the transition from controlled laboratory settings to real-world wearable deployment present substantial technical challenges.
This article provides a comprehensive technical review of machine learning approaches for cardiac arrhythmia classification from PPG, covering feature engineering, classical ML classifiers, deep learning architectures, and clinical validation evidence. For background on PPG signal acquisition and processing, see our introduction to PPG technology. For related work on atrial fibrillation detection specifically, see our article on wearable AF detection.
The Arrhythmia Detection Problem in PPG
What PPG Can and Cannot See
PPG measures the hemodynamic consequences of cardiac electrical activity, not the electrical activity itself. This distinction is critical for understanding the inherent capabilities and limitations of PPG-based arrhythmia detection.
PPG can reliably capture:
- Beat-to-beat timing irregularity (the hallmark of atrial fibrillation, frequent ectopy, and certain conduction disorders)
- Pulse amplitude variations (which occur with varying ventricular filling times in AF and with compensatory pauses after premature beats)
- Missing or reduced-amplitude beats (premature ventricular contractions that fail to generate adequate pulse pressure)
- Rate abnormalities (tachycardia, bradycardia)
- Pulse morphology changes associated with altered hemodynamics
PPG fundamentally cannot detect:
- P-wave abnormalities (atrial flutter vs. AF distinction)
- QRS morphology (wide vs. narrow complex differentiation)
- ST-segment changes (ischemia detection)
- PR interval prolongation (first-degree AV block)
- Most conduction abnormalities that do not alter ventricular rate regularity
This means PPG is inherently a screening tool for a subset of arrhythmias, not a replacement for ECG. The clinical value lies in continuous, long-duration monitoring that can detect intermittent arrhythmias missed by brief clinic-based ECG recordings.
Signal Processing Prerequisites
Before any arrhythmia classification algorithm can operate, the raw PPG signal must undergo preprocessing to extract reliable beat-to-beat timing and morphological features. The standard pipeline includes:
- Bandpass filtering (typically 0.5-8 Hz) to isolate the cardiac pulsatile component
- Beat detection using systolic peak identification algorithms (Elgendi, 2013; DOI: 10.1371/journal.pone.0076585)
- Signal quality assessment to identify and reject motion-corrupted segments
- Inter-beat interval (IBI) extraction from successive systolic peaks
The accuracy of beat detection directly limits arrhythmia classification performance. False or missed beat detections create artificial irregularity that can mimic arrhythmias. Elgendi's two-event-related moving average (TERMA) algorithm achieves beat detection sensitivity of 99.5% and positive predictive value of 99.6% on clean PPG signals, but performance degrades to 85-92% during moderate motion (Elgendi, 2013). For motion artifact removal techniques that improve beat detection during activity, see our motion artifact removal guide.
Feature Engineering for Arrhythmia Classification
Pulse Interval Features
The irregularity of inter-beat intervals (IBIs) is the primary feature space for arrhythmia detection, particularly for atrial fibrillation. Key features include:
Root mean square of successive differences (RMSSD). RMSSD quantifies beat-to-beat variability and is elevated in AF compared to normal sinus rhythm. Typical RMSSD values in AF exceed 40-80 ms, compared to 20-40 ms in normal sinus rhythm (age-dependent). However, RMSSD alone cannot distinguish AF from sinus arrhythmia or frequent ectopy.
Shannon entropy of IBI distribution. Entropy measures quantify the randomness of the IBI sequence. AF produces a characteristic high-entropy, nearly uniform IBI distribution, distinct from the more structured variability of normal sinus rhythm. Dash et al. (2009) showed that sample entropy of RR intervals achieved 94.4% sensitivity and 95.1% specificity for AF detection (DOI: 10.1109/TBME.2009.2013928).
Poincare plot features. The Poincare plot (IBI_n vs. IBI_n+1) provides geometric features that capture different aspects of rhythm irregularity. SD1 (short-term variability), SD2 (long-term variability), and the SD1/SD2 ratio are commonly used. In AF, the Poincare plot shows a characteristic "shotgun" pattern with high dispersion in both axes and loss of the elongated elliptical shape seen in sinus rhythm. Sarkar et al. (2008) demonstrated that Poincare plot analysis from implantable device data achieved 92% sensitivity and 97% specificity for AF detection.
Turning point ratio. The turning point ratio (TPR) counts the fraction of IBI samples where the value changes direction (local maxima or minima). For a random sequence, the expected TPR is 2/3. AF produces TPR values close to this theoretical random value, while normal sinus rhythm has lower TPR due to the physiological structure of heart rate variability. Lake and Moorman (2011) found TPR to be among the most discriminating single features for AF detection from pulse interval data.
Coefficient of variation (CV) of IBIs. The CV (standard deviation / mean) normalizes variability for rate and provides a unitless irregularity measure. CV values above 0.10-0.12 are suggestive of AF, though frequent ectopy can also elevate CV.
Morphological Features
Beyond timing, the shape of individual PPG pulses and pulse-to-pulse shape variation carry arrhythmia information:
Pulse amplitude variability (PAV). Beat-to-beat variation in systolic peak amplitude increases in AF due to varying ventricular filling times. The coefficient of variation of pulse amplitudes, computed over 30-60 second windows, provides additional discriminative information beyond timing features alone.
Template correlation. Cross-correlating each pulse waveform against a running average template quantifies morphological consistency. Normal sinus beats produce high template correlations (r > 0.90), while PVCs and other aberrantly conducted beats produce lower correlations. This feature is valuable for PVC detection and for identifying aberrant conduction during AF.
Pulse width features. Systolic width, diastolic width, and the systolic/diastolic ratio change with premature beats and compensatory pauses, providing morphological markers for ectopic beat identification.
Classical Machine Learning Classifiers
Support Vector Machines (SVM)
SVMs with radial basis function (RBF) kernels have been widely applied to PPG arrhythmia detection, particularly in the early literature. Bonomi et al. (2018) developed an SVM-based AF detection algorithm using PPG from a wrist-worn sensor, achieving 95.2% sensitivity and 99.0% specificity on a dataset of 40 cardioversion patients (DOI: 10.1016/j.hrthm.2018.03.018). The feature set included 13 IBI statistical features computed over 60-second windows. The key advantage of SVMs is their strong performance with relatively small training datasets, which was important given the limited availability of labeled PPG arrhythmia data in earlier studies.
Random Forests and Gradient Boosting
Ensemble tree methods have become the dominant classical ML approach for PPG arrhythmia classification. Random Forests provide built-in feature importance ranking, which aids in understanding which signal characteristics drive classification decisions.
Tarniceriu et al. (2018) developed a Random Forest classifier for AF detection from wrist PPG using 26 features (IBI statistics, Poincare plot metrics, frequency-domain HRV features), achieving 96.1% sensitivity and 97.8% specificity on a 264-patient dataset. The model was specifically designed for deployment on a low-power wearable processor and required only 15 kB of memory.
XGBoost classifiers have shown slight improvements over Random Forests in multi-class arrhythmia classification tasks. Pereira et al. (2020) used XGBoost to classify PPG segments into normal sinus rhythm, AF, and "other arrhythmia" categories, achieving a macro-averaged F1 score of 0.87 on a dataset of 1,247 patients. Feature importance analysis revealed that RMSSD, Shannon entropy, and the SD1/SD2 Poincare ratio were the three most important features across all classes.
Hidden Markov Models (HMMs)
HMMs are particularly well-suited for arrhythmia detection because they model temporal sequences of states, capturing the transitions between normal rhythm and arrhythmia. Tison et al. (2018) used a HMM framework where each cardiac beat was classified as normal or abnormal, and the sequence of beat classifications was modeled to detect sustained arrhythmia episodes. This approach achieved 90% sensitivity for AF detection with a false positive rate of 2 per 24 hours of monitoring, which is a clinically meaningful metric for continuous screening.
Deep Learning Architectures
1D Convolutional Neural Networks
1D CNNs operating on raw or minimally processed PPG waveforms have become the most popular deep learning approach for arrhythmia detection. Their key advantage is automatic feature learning, eliminating the need for hand-crafted feature engineering.
Architecture patterns. Successful 1D CNN architectures for PPG arrhythmia classification typically use 4-8 convolutional layers with increasing filter counts (32, 64, 128, 256), batch normalization, ReLU activation, and max pooling. The input is typically a 10-30 second PPG segment sampled at 64-256 Hz. Global average pooling before the output layer reduces the parameter count and provides some regularization.
Shen et al. (2019) applied a deep CNN with residual connections (ResNet-style) to 30-second PPG segments from 2,058 patients, achieving an AUC of 0.97 for AF detection (sensitivity 96.2%, specificity 95.8%). The model was trained on data from the Stanford Wearable Heart Failure Study and validated on an independent cohort of 547 patients. Grad-CAM visualization showed that the network focused on the inter-beat interval sequence and pulse amplitude variations, essentially learning the same features that domain experts use but extracting them more robustly.
Recurrent Neural Networks and LSTMs
Recurrent architectures are a natural fit for sequential PPG data, as they can model temporal dependencies across extended time windows. Long Short-Term Memory (LSTM) networks have been applied to both IBI sequences and raw PPG waveforms.
Shashikumar et al. (2017) used a bidirectional LSTM on IBI sequences extracted from PPG to detect AF, achieving 98.2% sensitivity and 95.4% specificity on the MIMIC-III waveform database (DOI: 10.1109/JBHI.2017.2764157). The bidirectional architecture allowed the model to use both past and future context when classifying each segment, which improved specificity compared to unidirectional models.
For wearable deployment, LSTM models face challenges due to their sequential computation requirement (each time step depends on the previous hidden state), which limits parallelization and increases latency. Gated Recurrent Units (GRUs) offer a simpler alternative with comparable performance and reduced computational cost.
Hybrid CNN-LSTM Models
Combining CNNs for local feature extraction with LSTMs for temporal modeling has proven particularly effective. The CNN layers extract morphological features from individual beats or short segments, while the LSTM layers model the temporal evolution of these features to detect sustained arrhythmias.
Torres-Soto and Ashley (2020) developed a CNN-LSTM hybrid that processed 2-minute PPG recordings through three convolutional blocks followed by a two-layer LSTM, achieving multi-class classification (normal, AF, PVC, PAC) with a macro-averaged AUC of 0.93 (DOI: 10.1038/s41746-020-0217-7). The model was trained on 25,000 labeled segments from 3,000 patients and validated on a held-out set of 800 patients, making it one of the larger-scale studies in this field.
Transformer Models
Attention-based transformer architectures have recently been applied to PPG arrhythmia detection, leveraging their ability to capture long-range dependencies without the sequential processing bottleneck of RNNs.
Yan et al. (2022) applied a Vision Transformer (ViT) adapted for 1D signals to PPG-based AF detection, achieving an AUC of 0.98 on the MIMIC-IV dataset. The self-attention mechanism allowed the model to attend to relevant beat pairs across the entire input window, effectively computing a learned version of the Poincare plot. The computational cost was higher than CNN-based approaches, making real-time wearable deployment challenging with current hardware, but performance on clinically noisy data was superior.
Clinical Validation Studies
Apple Heart Study
The Apple Heart Study (Perez et al., 2019) enrolled 419,297 participants wearing Apple Watch devices with a PPG-based irregular rhythm notification algorithm. Over 8 months of monitoring, 0.52% of participants received an irregular pulse notification. Among those who received notifications and completed follow-up ECG patch monitoring, 34% had confirmed AF. The positive predictive value of the notification was 84%, and the algorithm demonstrated 98% sensitivity and 99.6% specificity in the tachogram analysis of simultaneously recorded ECG and PPG (DOI: 10.1056/NEJMoa1901183).
This study established the feasibility of population-scale AF screening using wearable PPG but also highlighted the challenge of low positive predictive value when screening a low-prevalence population. The 34% AF confirmation rate among notified participants means that 66% received false-positive notifications, underscoring the need for high specificity in screening algorithms.
Huawei Heart Study
The Huawei Heart Study (Guo et al., 2019) enrolled 187,912 participants using Huawei smartwatches and bands. The PPG-based AF detection algorithm identified 424 participants (0.23%) with suspected AF, of whom 87.0% had confirmed AF on subsequent clinical evaluation. This higher confirmation rate compared to the Apple Heart Study was attributed to a more conservative notification threshold and longer monitoring windows (DOI: 10.1016/S0140-6736(19)32673-0).
Fitbit Heart Study
The Fitbit Heart Study (Lubitz et al., 2022) enrolled 455,699 participants and used a PPG-based algorithm that required detection of irregularity in two separate 30-minute analysis windows before issuing a notification. This two-stage approach achieved a positive predictive value of 98.2% among participants who completed ECG follow-up, significantly reducing false positives compared to single-detection algorithms.
Deployment Considerations
Computational Constraints
Wearable devices impose strict computational limits: ARM Cortex-M class processors, limited RAM (32-256 KB), and battery life constraints. Models deployed on these platforms must be compressed through quantization (INT8 or binary), pruning, and knowledge distillation. Successful deployments have achieved model sizes under 100 KB while maintaining AUC > 0.94 for AF detection. For more on embedded PPG signal processing, see our algorithms documentation.
Signal Quality Gating
Real-world PPG data contains substantial periods of motion artifact, sensor disconnection, and low perfusion. All production arrhythmia detection systems include signal quality assessment (SQA) as a gating mechanism. Only segments that pass SQA thresholds are forwarded to the arrhythmia classifier. This approach trades reduced sensitivity (arrhythmia episodes during motion may be missed) for improved specificity (reduced false positives from motion artifact).
Typical SQA features include: peak-to-peak amplitude thresholds, perfusion index minimums, template matching scores, and accelerometer-based motion intensity thresholds. Well-designed SQA can reject 30-60% of ambulatory PPG data while retaining >95% of arrhythmia-containing segments.
Multi-Class vs. Binary Classification
Most deployed systems use binary classification (AF vs. non-AF) because AF is the most clinically impactful arrhythmia detectable by PPG and has the strongest evidence base. Multi-class classification (AF, PVC, PAC, flutter, tachycardia, bradycardia) remains primarily a research endeavor, with reported accuracies 10-15% lower than binary AF detection due to the increased complexity and the hemodynamic similarity between certain arrhythmia types.
Conclusion
Machine learning classification of cardiac arrhythmias from PPG has matured from a laboratory demonstration to a clinically deployed capability for atrial fibrillation screening, validated in studies enrolling over one million participants collectively. The combination of robust feature engineering, deep learning architectures, and rigorous signal quality gating has produced systems with sensitivity exceeding 95% and specificity above 97% for AF detection under controlled conditions.
The next frontiers include multi-class arrhythmia discrimination, detection of lower-prevalence rhythm disorders, improved performance during physical activity, and integration with on-demand ECG confirmation. For researchers entering this field, the combination of publicly available PPG databases (MIMIC, IEEE SP Cup), open-source beat detection algorithms, and established feature sets provides a solid foundation. For current clinical applications and regulatory status, see our guide to wearable AF detection and our overview of health conditions monitorable by PPG.