Medical

Disease Classification

Diagnosing diseases from medical images or data.

9 datasets57 resultsView full task mapping →

Disease classification uses ML to diagnose medical conditions from images (radiology, pathology, dermatology), lab results, and clinical text. Models like CheXNet and Med-PaLM achieve specialist-level accuracy on narrow tasks, but clinical deployment requires FDA clearance, bias auditing, and integration with existing workflows.

History

2017

CheXNet (Rajpurkar et al.) matches radiologist performance on pneumonia detection from chest X-rays

2017

Esteva et al. demonstrate dermatologist-level skin cancer classification from images

2019

Google detects diabetic retinopathy from fundus images at specialist level

2020

COVID-19 accelerates deployment of AI diagnostic tools for chest CT classification

2021

REMEDIS applies self-supervised pretraining to improve medical image classification

2022

BiomedCLIP aligns medical images with clinical text for zero-shot disease classification

2023

Med-PaLM 2 achieves expert-level performance on medical question answering

2024

FDA clears 900+ AI medical devices, primarily for radiology classification

2024

Foundation models (BiomedGPT, Med-Gemini) show broad medical classification capabilities

2025

Multimodal medical AI combines imaging, labs, clinical notes, and genomics for diagnosis

How Disease Classification Works

Data Acquisition

Medical images (X-ray, CT, MRI, pathology slides), lab results, or clinical notes are collected and de-identified.

Preprocessing

Images are normalized, augmented, and standardized. Clinical text is tokenized and structured. Missing data is handled.

Feature Extraction

A pretrained backbone (ResNet, ViT, BioClinicalBERT) extracts discriminative features from the input modality.

Classification

Extracted features are mapped to disease categories through classification heads, often with multi-label output for comorbidities.

Calibration and Uncertainty

Prediction probabilities are calibrated, and uncertainty estimates flag cases for human review — critical for clinical safety.

Current Landscape

Disease classification in 2025 is the most commercially mature area of medical AI, with 900+ FDA-cleared devices. Radiology leads (chest X-ray, mammography, CT triage), followed by pathology (cancer grading) and dermatology (skin lesion classification). Foundation models are beginning to enable zero-shot classification of conditions not seen in training. The key tension is between research benchmarks (where models match specialists) and real-world deployment (where distribution shift, workflow integration, and regulatory requirements create significant barriers).

Key Challenges

Data imbalance — rare diseases have very few labeled examples, leading to poor sensitivity on important classes

Distribution shift — models trained at one hospital often perform poorly at others due to equipment and population differences

Regulatory burden — FDA/CE clearance requires extensive clinical validation, adding years to deployment timelines

Demographic bias — models may perform worse on underrepresented populations (race, age, sex) in training data

Clinical integration — fitting AI predictions into physician workflows without disrupting care is a UX and systems challenge

Quick Recommendations

Chest X-ray classification

CheXNet / TorchXRayVision

Well-validated models for pneumonia, cardiomegaly, and 14 other chest conditions

General medical image classification

BiomedCLIP / Med-Gemini

Foundation models with broad medical image understanding

Clinical NLP classification

Med-PaLM 2 / BioClinicalBERT

Best models for classifying diseases from clinical text and notes

Production deployment

FDA-cleared tools (Aidoc, Viz.ai)

Regulatory clearance required for clinical use — commercial tools have it

What's Next

The frontier is multimodal disease classification — combining imaging, genomics, lab results, clinical history, and social determinants into unified diagnostic models. Expect federated learning to enable training across hospitals without sharing patient data, and increasingly automated clinical trial matching based on AI-classified patient characteristics.

Benchmarks & SOTA

ABIDE I

Autism Brain Imaging Data Exchange I

201233 results

1,112 resting-state fMRI datasets from 539 individuals with autism spectrum disorder (ASD) and 573 typically developing controls across 17 international sites. Multi-site neuroimaging data for autism classification and biomarker discovery.

State of the Art

SSAE + Softmax (Explainable ASD)

Academic

98.2

accuracy

CheXpert

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels

20197 results

224,316 chest radiographs from 65,240 patients with 14 pathology labels. Includes uncertainty labels and expert radiologist annotations for validation set. The gold standard for chest X-ray classification.

State of the Art

CheXpert AUC Maximizer

Stanford

auroc

NIH ChestX-ray14

NIH Clinical Center Chest X-ray Dataset

20174 results

112,120 frontal-view chest X-ray images from 30,805 unique patients with 14 disease labels extracted using NLP from radiology reports. Foundational benchmark for chest X-ray AI.

State of the Art

TorchXRayVision

Cohen Lab

85.8

auroc

MIMIC-CXR

MIMIC-CXR: Medical Information Mart for Intensive Care - Chest X-ray

20193 results

377,110 chest X-ray images from 227,835 studies of 65,379 patients with free-text radiology reports. Largest publicly available chest X-ray dataset with paired image-text data.

State of the Art

CheXzero

Harvard/MIT

89.2

auroc

RSNA Pneumonia Detection

RSNA Pneumonia Detection Challenge

20183 results

30,000 frontal chest radiographs with bounding boxes for pneumonia detection. From 2018 RSNA Kaggle competition. Tests both classification and localization.

State of the Art

DenseNet-121 (Chest X-ray)

Research

88.5

auroc

ABIDE II

Autism Brain Imaging Data Exchange II

20172 results

1,114 datasets from 521 individuals with autism spectrum disorder (ASD) and 593 typically developing controls across 19 sites. Second large-scale release complementing ABIDE I with additional multi-site neuroimaging data.

State of the Art

DeepASD

Research

auc

VinDr-CXR

VinDr-CXR: Vietnamese Dataset for Chest Radiograph

20222 results

18,000 chest X-ray scans with radiologist annotations for 22 local labels and 6 global labels. Each image annotated by 3 radiologists with bounding box localization.

State of the Art

RAD-DINO

Microsoft

91.2

auroc

COVID-19 Image Data Collection

20202 results

Curated dataset of COVID-19 chest X-ray and CT images with clinical metadata. Critical resource during the pandemic for developing AI diagnostic tools.

State of the Art

DenseNet-121 (Chest X-ray)

Research

94.7

auroc

PadChest

PadChest: A Large Chest X-ray Image Dataset

20201 results

160,868 images from 67,625 patients with 174 radiographic findings, 19 diagnoses, and 104 anatomic locations. Multi-label classification with hierarchical taxonomy.

State of the Art

TorchXRayVision

Cohen Lab

84.6

auroc

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Disease Classification benchmarks accurate. Report outdated results, missing benchmarks, or errors.

Back to Medical

Disease Classification

History

How Disease Classification Works

Current Landscape

Key Challenges

Quick Recommendations

What's Next

Benchmarks & SOTA

ABIDE I

CheXpert

NIH ChestX-ray14

MIMIC-CXR

RSNA Pneumonia Detection

ABIDE II

VinDr-CXR

COVID-19 Image Data Collection

PadChest

Related Tasks

Medical Image Segmentation

Clinical NLP

Drug Discovery

Something wrong or missing?