Adversarial

Need to test model robustness? Benchmark resilience against adversarial attacks and evaluate defense mechanisms.

2 tasks0 datasets0 results

Adversarial robustness studies how ML models behave under intentionally crafted perturbations. The field has matured from curiosity-driven attacks to certified defenses and standardized benchmarks, driven by real-world deployment risks in safety-critical systems.

State of the Field (2025)

  • RobustBench leaderboard standardizes evaluation: top models achieve 70%+ robust accuracy on CIFAR-10 under AutoAttack (Linf, eps=8/255), but ImageNet robust accuracy remains below 50% for most defenses
  • Adversarial training with generated data (diffusion-augmented AT) pushed CIFAR-10 robust accuracy past 73%, closing the gap between clean and adversarial performance
  • Certified defenses (randomized smoothing, interval bound propagation) provide provable guarantees but at 15-30% clean accuracy cost, limiting practical adoption to high-stakes domains
  • Foundation models show mixed robustness: CLIP-based classifiers exhibit surprising adversarial transferability, while vision-language models remain vulnerable to typographic and multimodal attacks

Quick Recommendations

Evaluating model robustness (standard benchmark)

AutoAttack + RobustBench

Parameter-free ensemble of attacks provides reliable robustness evaluation. RobustBench leaderboard enables apples-to-apples comparison across defenses.

Training robust image classifiers

Adversarial training with diffusion-augmented data

Synthetic data from diffusion models improves adversarial robustness without sacrificing clean accuracy. Current SOTA on CIFAR-10 and ImageNet robust benchmarks.

Certified robustness for safety-critical deployments

Randomized smoothing (Cohen et al.) or SmoothAdv

Provable L2 robustness guarantees independent of attack strength. Accept the clean accuracy tradeoff when formal guarantees are non-negotiable.

Red-teaming LLMs and multimodal models

GCG attacks + manual jailbreak auditing

Greedy Coordinate Gradient generates universal adversarial suffixes. Combine with human red-teaming for comprehensive safety evaluation before deployment.

Tasks & Benchmarks

Show all datasets and SOTA results

Adversarial Attacks

No datasets indexed yet. Contribute on GitHub

Adversarial Robustness

No datasets indexed yet. Contribute on GitHub

Honest Takes

Robustness-accuracy tradeoff is real but shrinking

For years, improving adversarial robustness meant sacrificing 10-20% clean accuracy. Diffusion-augmented training and better architectures have cut this gap to under 5% on CIFAR-10. The tradeoff still exists on ImageNet-scale, but it's no longer a dealbreaker.

Most production systems have zero adversarial robustness

Despite a decade of research, almost no deployed ML systems use adversarial training or certified defenses. The field publishes papers while production models remain trivially attackable. If you're deploying a classifier in a security-sensitive context, you're likely exposed.

LLM jailbreaks are the new adversarial examples

The adversarial robustness community spent years on image perturbations. Now the same fundamental problem has exploded in LLMs, where adversarial prompts bypass safety filters. The techniques are different but the lesson is the same: models are more fragile than benchmarks suggest.