🧬🛡️ BioScreen

Function-aware DNA synthesis screening against AI-designed biological threats. Detects dangerous proteins by mechanism of harm, not sequence similarity.

AIxBio Hackathon 2026 Track 1: CBAI ESM-2 3B Contrastive Learning Multi-Task Certified Robust
99.3%
Detection @ <40% seq identity
0.998
AUROC (production)
44.4
Sequences / second
8
Threat mechanisms

⚠️ The Problem

The Screening Gap

Current DNA synthesis screening (SecureDNA, IBBIS Common Mechanism) relies on sequence homology — matching against databases of known threats. AI protein design tools (ProteinMPNN, RFdiffusion, Evo2) can generate functional variants with <30% sequence identity to known pathogens, evading all existing screening.


Microsoft et al. (Science, Oct 2025) demonstrated AI-designed variants slipping through screening undetected.

Our Solution

BioScreen learns a functional embedding space where proteins cluster by mechanism of harm, not sequence similarity. A pore-forming toxin redesigned by AI still clusters with other pore-forming toxins — and gets flagged.


Three outputs per sequence:

  • Binary decision: PASS / REVIEW / BLOCK
  • Mechanism of harm: 7 threat classes + benign
  • Risk score with calibrated uncertainty

🔬 Threat Mechanisms Detected

BioScreen classifies threats into 7 biological mechanisms of harm:

🔴 Neurotoxicity
Proteins targeting the nervous system (botulinum, tetanus, conotoxins)
🔴 Viral Entry
Viral fusion proteins mediating host cell entry and membrane fusion
🟠 Enzymatic Disruption
Toxins with enzymatic activity (ADP-ribosylation, proteases)
🟠 Immune Evasion
Proteins helping pathogens evade host immune responses
🟠 Membrane Disruption
Pore-forming toxins that lyse cell membranes
🟠 Hemolysis
Hemolysins and cytolysins that destroy red blood cells
🟡 Host Adhesion
Pathogen proteins enabling attachment to host cells

📊 Results

Production Model — Binary Screening

MethodAUROCDet. @ <40% IDRecall @ 5% FPR
k-mer Similarity (BLAST proxy)0.78565.8%55.2%
Raw ESM-2 Cosine Similarity0.96786.9%84.1%
🛡️ BioScreen (Production) 0.99899.3%98.7%

Ablation Study (Contrastive-Only Training)

VariantAUROCDet. @ <40% IDRecall @ 5% FPR
Contrastive + PGD Adversarial0.95872.6%76.2%
Contrastive Only (Best Ablation) 0.98096.2%95.0%
No Contrastive (Adversarial Only)0.84848.7%56.1%
Frozen ESM-2 + Linear Head0.78125.7%31.8%

⚠️ Key Finding: Adversarial Training in Embedding Space Hurts

PGD adversarial training degrades detection from 96.2% to 72.6%. Representation-space perturbations are poor proxies for actual AI-designed biological variants, which navigate the structured manifold of functional proteins. This is a cautionary finding for the field: naive adversarial robustness techniques from computer vision do not transfer directly to biosecurity screening.

Mechanism Classification (Production Model)

MetricValue
Overall Accuracy90.8%
Macro F10.891
Weighted F10.907
Number of Classes8 (7 threats + benign)
Training Sequences4,981
Adversarial Variants2,500

📈 Figures

Main Results
Figure 1. ROC curves and detection rates by sequence identity. BioScreen maintains >95% detection where baselines collapse.
UMAP Embeddings
Figure 2. UMAP of embedding spaces. Functional embeddings (right) cluster adversarial variants with their parent threats.
Certified Robustness
Figure 3. Certified robustness via randomized smoothing. 100% of threats receive certified predictions across all noise levels.
Runtime
Figure 4. Throughput on NVIDIA H200. 44.4 seq/s — 370× headroom above real-time screening requirements.

🏗️ Architecture

Input Protein Sequence │ ▼ ┌───────────────────┐ │ ESM-2 3B │ 2.84B params, 478M trainable │ (30/36 frozen) │ fp16 autocast └────────┬──────────┘ ▼ ┌───────────────────┐ │ Functional │ Supervised contrastive learning │ Embedding │ L2-normalized, 256-dim │ + Residual MLP │ Clusters by mechanism └────────┬──────────┘ │ ┌────┼────────────┬─────────┐ ▼ ▼ ▼ ▼ Binary Mechanism Risk Contrastive Head Classifier Scorer Projection │ │ │ │ ▼ ▼ ▼ ▼ PASS/ 7 threats [0,1] 128-dim REVIEW/ + benign ± σ (train only) BLOCK

🔬 Interactive Demo

Enter a protein sequence to screen. This runs a deterministic client-side heuristic for demonstration purposes. For production screening, deploy the API server with the trained ESM-2 model.

🚀 Quick Start

Python API

from bioscreen import BioScreener

screener = BioScreener.from_pretrained("bioscreen-v1")
results  = screener.screen(["MKFLVLLF..."])

for r in results:
    print(f"{r.decision}: {r.predicted_mechanism}")
    print(f"  Risk: {r.risk_score:.2f} ± {r.risk_uncertainty:.2f}")

REST API

# Start server
docker compose -f docker/docker-compose.yml up

# Screen a sequence
curl -X POST http://localhost:8080/v1/screen \
  -H "Content-Type: application/json" \
  -d '{"sequences": ["MKFLVLLF..."]}'

Install from Source

git clone https://github.com/YOUR_USERNAME/bioscreen.git
cd bioscreen
pip install -e .

🔗 Integration with Existing Infrastructure

SecureDNA Integration

BioScreen runs as a second-stage filter. Sequences passing SecureDNA's homology screening are re-evaluated by the functional embedding model.

# Pipeline: SecureDNA → BioScreen
if securedna.passes(seq):
    result = bioscreen.screen(seq)
    if result.decision == "BLOCK":
        flag_for_review(seq)

IBBIS Common Mechanism

Complementary to IBBIS HMM-based screening. BioScreen excels at detecting novel functional variants below 30% sequence identity.

# Parallel screening
ibbis_result = commec.screen(seq)
bio_result   = bioscreen.screen(seq)

# Flag if either detects
if ibbis_result.hit or \
   bio_result.decision != "PASS":
    escalate(seq)

💰 Coefficient Giving RFP Alignment

Maps to multiple priority areas in Coefficient Giving's biosecurity RFP (closes May 11, 2026):

RFP PriorityBioScreen Contribution
AI-accelerated biosecurity defenses Function-aware screening that catches AI-designed evasion variants
LLM classifiers for biology misuse Multi-task mechanism-of-harm classifier (8 classes, 90.8% accuracy)
Resilient AI safeguards Certified robustness via randomized smoothing
Open-weight catch-up analysis Adversarial variant generation benchmarking evasion rates

Get Started

BioScreen is open source (MIT License) and designed for drop-in integration with existing screening infrastructure.

⭐ GitHub Repository 🤗 HuggingFace Demo