PPG Skin Tone Bias: How Melanin Affects Accuracy in Pulse Oximeters and Wearables
Evidence-based analysis of skin pigmentation effects on PPG accuracy. Covers melanin-light interactions, SpO2 bias data, regulatory gaps, and engineering solutions.
PPG Skin Tone Bias: How Melanin Affects Accuracy in Pulse Oximeters and Wearables
Pulse oximeters and PPG-based wearables systematically overestimate blood oxygen saturation in individuals with darker skin pigmentation, a bias with measurable clinical consequences that has persisted for decades despite being well-documented in the scientific literature. This is not a theoretical concern: Sjoding et al. (2020) demonstrated in a landmark study of over 10,000 paired measurements that Black patients had nearly three times the rate of occult hypoxemia compared to White patients when relying on pulse oximeter readings. Understanding the optical physics behind this bias, its clinical magnitude, and the engineering approaches to mitigate it is essential for anyone designing, deploying, or relying on PPG-based physiological monitoring.
This article examines the biophysics of melanin-light interaction, quantifies the accuracy disparities from published clinical data, reviews the regulatory landscape, and discusses technical solutions for more equitable PPG device design. For background on how PPG technology works, see our introduction to PPG technology.
The Biophysics of Melanin and Light Absorption
Melanin is a broadband chromophore concentrated in the epidermis, the outermost layer of skin. Its primary biological function is photoprotection through ultraviolet light absorption, but it absorbs across the entire visible and near-infrared spectrum relevant to PPG measurements. Two forms are relevant: eumelanin (brown-black, dominant in darker skin) and pheomelanin (yellow-red, more prominent in lighter skin).
The absorption spectrum of melanin follows an approximately monotonic decrease with increasing wavelength, often modeled as:
mu_a(melanin) proportional to wavelength^(-3.33)
This means melanin absorbs substantially more at shorter wavelengths (green, 530nm) than at longer wavelengths (red, 660nm) or near-infrared (940nm). At 530nm (green LED wavelength), melanin absorption in Fitzpatrick type VI skin can be 3-5 times higher than in type I skin. At 940nm (infrared), this ratio decreases to approximately 1.3-1.8 times.
The practical consequence is that melanin acts as an optical filter that attenuates the PPG signal before it reaches the blood-containing dermis and after it returns from tissue to the photodetector. In reflectance-mode PPG (used in wrist wearables), the light passes through the epidermis twice (in and out), doubling the melanin attenuation effect. In transmission-mode PPG (fingertip clip pulse oximeters), the path traverses the epidermis at entry and exit as well.
Critically, melanin attenuation affects the DC (baseline) and AC (pulsatile) components of the PPG signal differently. The DC component, which represents total light absorption by tissue, blood, and melanin, increases substantially with melanin content. The AC component, representing the cardiac-driven pulsatile blood volume change, is attenuated proportionally. This reduces the AC/DC ratio (perfusion index), which is the fundamental quantity from which heart rate and SpO2 are derived.
How Melanin Biases SpO2 Measurement
Pulse oximetry calculates blood oxygen saturation from the ratio of ratios (R):
R = (AC_red / DC_red) / (AC_IR / DC_IR)
This ratio is mapped to SpO2 through empirical calibration curves derived from controlled desaturation studies in human volunteers. The critical assumption is that R depends only on the relative concentrations of oxyhemoglobin and deoxyhemoglobin. Melanin violates this assumption.
Because melanin absorbs more at 660nm (red) than at 940nm (infrared), it disproportionately attenuates the red channel's DC component. This alters the R value in a direction that mimics higher oxygen saturation (lower R corresponds to higher SpO2 on the calibration curve). The effect is subtle but clinically significant: a true SaO2 of 88% might produce a pulse oximeter reading of 92-96% in a patient with high melanin content.
Bickler et al. (2005) (DOI: 10.1097/00000542-200501000-00029) conducted controlled desaturation studies in 36 healthy volunteers across a range of skin pigmentations. They found that at true SaO2 values below 80%, commercially available pulse oximeters overestimated saturation by 4-8% in darkly pigmented subjects compared to 1-2% in lightly pigmented subjects. The bias increased as true oxygen saturation decreased, meaning the error is greatest precisely when accurate measurement matters most.
Feiner et al. (2007) (DOI: 10.1213/01.ane.0000267537.01234.55) extended this work with eight different pulse oximeter models tested on 281 subjects during controlled desaturation to SaO2 levels as low as 60%. They confirmed a consistent overestimation bias of 2-7% in darkly pigmented subjects across all devices tested, with no device achieving clinically acceptable accuracy (defined as bias within 2%) in dark skin at saturation levels below 80%.
Clinical Consequence: Occult Hypoxemia
The seminal study by Sjoding et al. (2020) (DOI: 10.1056/NEJMc2029240), published in the New England Journal of Medicine, brought the clinical impact of pulse oximetry bias to broad attention. Analyzing 10,789 paired pulse oximetry and arterial blood gas measurements from two academic medical centers, they found:
- Occult hypoxemia (SaO2 < 88% with SpO2 92-96%): 11.7% in Black patients vs. 3.6% in White patients (odds ratio 3.56)
- The absolute difference was 8.1 percentage points, meaning roughly 1 in 9 Black patients with SpO2 readings in the 92-96% range were actually hypoxemic
- Among patients receiving supplemental oxygen, the disparity was even more pronounced
These findings have direct treatment implications. Clinical guidelines use SpO2 thresholds of 94% (general wards) and 88-92% (COPD patients) to guide oxygen therapy decisions. If a pulse oximeter reads 94% but the true SaO2 is 87%, the patient may not receive necessary oxygen supplementation or escalation of care.
Subsequent studies have confirmed and extended these findings. Valbuena et al. (2022) (DOI: 10.1001/jamainternmed.2021.7674) analyzed 87,971 paired measurements and found that occult hypoxemia rates were 7.1% in Asian patients, 5.7% in Hispanic patients, and 3.2% in White patients, demonstrating a gradient of bias across skin pigmentation levels. Fawzy et al. (2022) linked pulse oximetry bias to delayed recognition of COVID-19 hypoxemia in Black patients during the pandemic, demonstrating that the bias had concrete consequences for treatment timing.
Impact on Wearable Heart Rate Accuracy
Skin tone bias in PPG extends beyond SpO2 to affect heart rate accuracy, particularly during motion. Bent et al. (2020) (DOI: 10.1038/s41746-020-0226-6) evaluated six commercial wrist-worn PPG devices in 53 participants across diverse skin tones during rest and activity. They found that heart rate mean absolute error was 1.5-5.0 BPM higher in the darkest skin tone group compared to the lightest group, with the disparity widening during exercise.
The mechanism differs from SpO2 bias. For heart rate, the issue is signal-to-noise ratio. Melanin attenuation reduces the PPG signal amplitude, lowering the SNR and making the pulsatile signal more susceptible to corruption by motion artifacts. During exercise, when motion artifact power can exceed the pulsatile signal by 10-100 times, the lower baseline SNR in darker skin means the cardiac signal is lost more frequently, leading to higher error rates and more dropout periods.
Colvonen et al. (2020) evaluated consumer wearable heart rate accuracy during sleep across skin tones and found smaller but still significant differences (0.5-1.5 BPM additional error for darker skin). The sleep context minimizes motion artifacts, isolating the pure signal amplitude effect and demonstrating that the bias exists even under favorable measurement conditions.
For a detailed comparison of how different PPG wavelengths interact with skin pigmentation, see our guide on green vs red vs infrared PPG.
Regulatory and Standards Landscape
Historical Context
The ISO 80601-2-61 standard governing pulse oximeter testing has historically required accuracy validation across a specified SpO2 range (70-100%) but did not mandate specific skin pigmentation diversity in test populations. The consequence was that many pulse oximeters received regulatory clearance based on clinical studies conducted predominantly on light-skinned subjects.
A 2005 survey of pulse oximeter 510(k) submissions found that fewer than 15% of subjects in validation studies had darkly pigmented skin, and some studies included no darkly pigmented subjects at all. This created a systematic validation gap where devices could meet regulatory requirements while failing to perform adequately for a substantial portion of the population.
FDA Response
In February 2023, the FDA issued updated guidance titled "Pulse Oximeters - Premarket Notification Submissions [510(k)]" that addressed skin tone bias directly. Key recommendations included requiring clinical study populations to include at least 2 subjects (and ideally at least 15% of total enrollment) with Fitzpatrick skin types V or VI, reporting accuracy metrics stratified by skin pigmentation category, and providing labeling that communicates the limitations of pulse oximetry in diverse skin tones.
In November 2024, the FDA convened an advisory committee that discussed potentially requiring post-market surveillance studies for currently cleared devices to evaluate skin tone performance. As of early 2026, this remains under discussion and no mandatory post-market requirement has been enacted.
International Standards
The ISO technical committee is revising ISO 80601-2-61 to incorporate skin pigmentation requirements. The proposed revision includes mandatory inclusion of subjects across a defined skin pigmentation spectrum (using objective spectrophotometric measurement rather than subjective Fitzpatrick typing), performance criteria that must be met independently for each pigmentation subgroup, and specific reporting of root mean square error (RMSE or Arms) and mean bias stratified by skin pigmentation.
Engineering Solutions and Mitigation Strategies
Multi-Wavelength Approaches
Adding wavelengths beyond the standard red (660nm) and infrared (940nm) pair enables explicit estimation of melanin absorption. A third wavelength at a point where melanin absorbs but hemoglobin species do not differentiate (such as 730nm or an isosbestic wavelength) can provide an independent melanin correction factor.
Masimo's co-oximetry platform uses 7+ wavelengths to separately quantify oxyhemoglobin, deoxyhemoglobin, carboxyhemoglobin, methemoglobin, and total hemoglobin. The additional wavelengths inherently provide melanin correction because the system solves a multi-variable optical model rather than relying on a single R-ratio. Clinical studies have shown reduced skin tone bias with multi-wavelength systems, though the devices are substantially more expensive.
For wearable applications using green-light PPG, adding an infrared channel specifically for melanin estimation has been proposed. The ratio of green to infrared DC levels correlates with epidermal melanin content (r = 0.85-0.92 in controlled studies), enabling adaptive signal processing that adjusts filtering parameters and detection thresholds based on estimated melanin level.
Adaptive Calibration
Skin-tone-adaptive calibration modifies the R-ratio-to-SpO2 lookup table based on an estimate of the subject's melanin content. This can be derived from the DC signal level at one or more wavelengths, from a user-input skin tone setting, or from a brief initial calibration period. Bickler et al. (2005) demonstrated that custom calibration curves for darkly pigmented subjects reduced SpO2 bias from approximately 4-5% to 1-2%. The challenge is implementing this adaptively without requiring manual input.
Hardware Design Optimization
Increasing LED optical power can partially compensate for melanin attenuation by driving higher photocurrent at the detector. However, this increases power consumption and thermal dissipation. For wearable applications, LED power is typically constrained to 5-25 mW for green and 1-5 mW for infrared to maintain acceptable battery life.
Optimizing the LED-photodetector geometry can help. Smaller source-detector separations favor shallow tissue sampling, which reduces the relative melanin path length. However, this also reduces pulsatile signal depth and sensitivity to deeper arterial pulsations. The optimal geometry represents a compromise that varies with skin pigmentation.
Increasing photodetector area or using avalanche photodiodes (APDs) with internal gain can improve SNR without increasing LED power. This approach is more practical for clinical devices than for consumer wearables, where component cost and power constraints are tighter. For more on how sensor design affects signal quality, see our algorithms and signal processing resources.
Algorithmic Approaches
Signal processing algorithms can be designed to be more robust to low-SNR conditions. Adaptive filtering with automatically adjusting thresholds, ensemble averaging over more cardiac cycles (reducing random noise at the expense of temporal resolution), and machine learning models trained on diverse populations all contribute to reduced skin tone bias.
Importantly, algorithms trained predominantly on data from light-skinned subjects will inherently perform worse on darker skin because the noise characteristics and signal morphology differ. Training data diversity is as critical as algorithmic sophistication. The MIMIC-III waveform database, widely used for PPG algorithm development, has limited skin tone metadata, making it difficult to ensure balanced training.
Path Forward for Equitable PPG Technology
Addressing PPG skin tone bias requires coordinated efforts across hardware design, algorithm development, clinical validation, and regulatory requirements. The technical solutions exist: multi-wavelength sensing, adaptive calibration, and diverse training data can substantially reduce the bias. The barriers are primarily economic (multi-wavelength sensors cost more), historical (existing calibration curves were built on biased datasets), and regulatory (mandatory skin tone diversity requirements are only now being implemented).
For researchers and engineers working with PPG, the immediate priorities should include reporting accuracy metrics stratified by skin pigmentation in all validation studies, ensuring training and test datasets include adequate representation across skin tones, designing sensor hardware with infrared channels that can serve dual roles in SpO2 measurement and melanin estimation, and advocating for updated standards that mandate equitable device performance.
The clinical stakes are high. PPG technology is expanding from hospital pulse oximeters to billions of consumer wearable devices used for health monitoring, sleep tracking, and chronic disease management. If these devices systematically underperform for people with darker skin, they risk widening rather than narrowing health disparities. Getting the engineering right is both a technical challenge and an ethical imperative.
For additional context on how different PPG conditions affect measurement accuracy, explore our conditions reference.
References
- Bickler et al. (2005) (DOI: 10.1097/00000542-200501000-00029) conducted controlled desaturation studies in 36 healthy volunteers across a range of skin pigmentations. They found that at true SaO2 values below 80%, commercially available pulse oximeters overestimated saturation by 4-8% in darkly pigmented subjects compared to 1-2% in lightly pigmented subjects. The bias increased as true oxygen saturation decreased, meaning the error is greatest precisely when accurate measurement matters most.
- Feiner et al. (2007) (DOI: 10.1213/01.ane.0000267537.01234.55) extended this work with eight different pulse oximeter models tested on 281 subjects during controlled desaturation to SaO2 levels as low as 60%. They confirmed a consistent overestimation bias of 2-7% in darkly pigmented subjects across all devices tested, with no device achieving clinically acceptable accuracy (defined as bias within 2%) in dark skin at saturation levels below 80%.
- The seminal study by Sjoding et al. (2020) (DOI: 10.1056/NEJMc2029240), published in the New England Journal of Medicine, brought the clinical impact of pulse oximetry bias to broad attention. Analyzing 10,789 paired pulse oximetry and arterial blood gas measurements from two academic medical centers, they found:
- Subsequent studies have confirmed and extended these findings. Valbuena et al. (2022) (DOI: 10.1001/jamainternmed.2021.7674) analyzed 87,971 paired measurements and found that occult hypoxemia rates were 7.1% in Asian patients, 5.7% in Hispanic patients, and 3.2% in White patients, demonstrating a gradient of bias across skin pigmentation levels. Fawzy et al. (2022) linked pulse oximetry bias to delayed recognition of COVID-19 hypoxemia in Black patients during the pandemic, demonstrating that the bias had concrete consequences for treatment timing.
- Skin tone bias in PPG extends beyond SpO2 to affect heart rate accuracy, particularly during motion. Bent et al. (2020) (DOI: 10.1038/s41746-020-0226-6) evaluated six commercial wrist-worn PPG devices in 53 participants across diverse skin tones during rest and activity. They found that heart rate mean absolute error was 1.5-5.0 BPM higher in the darkest skin tone group compared to the lightest group, with the disparity widening during exercise.
Frequently Asked Questions
- Are pulse oximeters less accurate on darker skin?
- Yes, substantial clinical evidence demonstrates that conventional pulse oximeters overestimate blood oxygen saturation (SpO2) in individuals with darker skin pigmentation. Sjoding et al. (2020) analyzed 10,789 paired arterial blood gas and pulse oximetry measurements and found that Black patients had nearly three times the rate of occult hypoxemia (arterial oxygen saturation below 88% despite pulse oximeter readings of 92-96%) compared to White patients (11.7% vs 3.6%). This bias results from melanin absorbing light at the red and infrared wavelengths used for SpO2 calculation, altering the ratio of ratios (R-value) that calibration curves are built upon. The clinical consequence is that truly hypoxic patients with darker skin may not receive supplemental oxygen or escalated care in a timely manner.
- Which PPG wavelength is least affected by skin tone?
- Infrared light (wavelengths above 850nm, typically 940nm in pulse oximeters) is least affected by melanin absorption. Melanin absorption follows an inverse power law with wavelength, decreasing approximately as wavelength to the power of negative 3.5 across the visible and near-infrared spectrum. At 940nm, melanin absorption is roughly 5-8 times lower than at 660nm (red) and 10-15 times lower than at 530nm (green). This is why infrared-based PPG sensors show smaller signal amplitude differences across skin tones compared to green-light sensors. However, even at 940nm, melanin is not negligible, and the SpO2 calibration ratio depends on both red and infrared channels, so melanin effects on the red channel still propagate into the SpO2 calculation.
- Has the FDA addressed pulse oximeter skin tone bias?
- The FDA issued updated guidance in February 2023 recommending that pulse oximeter manufacturers test devices across a diverse range of skin pigmentations, specifically requiring clinical studies to include at least 15% of subjects with dark skin pigmentation (Fitzpatrick skin types V and VI). Prior to this, the FDA's 510(k) clearance pathway did not mandate specific skin tone diversity in validation studies, and many cleared devices were tested predominantly on light-skinned subjects. The updated guidance also recommends that manufacturers report accuracy metrics stratified by skin pigmentation. However, these are guidance recommendations and not binding regulations, and many currently marketed devices received clearance under the older, less stringent requirements.
- Can software calibration fix the skin tone bias in pulse oximeters?
- Software calibration can partially mitigate but not fully eliminate skin tone bias. The bias arises from melanin altering the optical path in ways that violate the assumptions of standard R-ratio calibration curves. Skin-tone-adaptive algorithms can adjust calibration curves based on DC signal levels (which correlate with melanin content) or additional wavelength channels that estimate melanin absorption independently. Bickler et al. (2005) showed that custom calibration curves for darkly pigmented subjects reduced SpO2 bias from approximately 4-5% to 1-2% in controlled desaturation studies. However, this requires either explicit skin tone input, a multi-wavelength sensor capable of melanin estimation, or personalized calibration against arterial blood gas, none of which are standard in current commercial devices.