BioScreen — Function-Aware DNA Synthesis Screening

⚠️ The Problem

The Screening Gap

Current DNA synthesis screening (SecureDNA, IBBIS Common Mechanism) relies on sequence homology — matching against databases of known threats. AI protein design tools (ProteinMPNN, RFdiffusion, Evo2) can generate functional variants with <30% sequence identity to known pathogens, evading all existing screening.

Microsoft et al. (Science, Oct 2025) demonstrated AI-designed variants slipping through screening undetected.

Our Solution

BioScreen learns a functional embedding space where proteins cluster by mechanism of harm, not sequence similarity. A pore-forming toxin redesigned by AI still clusters with other pore-forming toxins — and gets flagged.

Three outputs per sequence:

Binary decision: PASS / REVIEW / BLOCK
Mechanism of harm: 7 threat classes + benign
Risk score with calibrated uncertainty

🔬 Threat Mechanisms Detected

BioScreen classifies threats into 7 biological mechanisms of harm:

🔴 Neurotoxicity

Proteins targeting the nervous system (botulinum, tetanus, conotoxins)

🔴 Viral Entry

Viral fusion proteins mediating host cell entry and membrane fusion

🟠 Enzymatic Disruption

Toxins with enzymatic activity (ADP-ribosylation, proteases)

🟠 Immune Evasion

Proteins helping pathogens evade host immune responses

🟠 Membrane Disruption

Pore-forming toxins that lyse cell membranes

🟠 Hemolysis

Hemolysins and cytolysins that destroy red blood cells

🟡 Host Adhesion

Pathogen proteins enabling attachment to host cells

📊 Results

Production Model — Binary Screening

Method	AUROC	Det. @ <40% ID	Recall @ 5% FPR
k-mer Similarity (BLAST proxy)	0.785	65.8%	55.2%
Raw ESM-2 Cosine Similarity	0.967	86.9%	84.1%
🛡️ BioScreen (Production)	0.998	99.3%	98.7%

Ablation Study (Contrastive-Only Training)

Variant	AUROC	Det. @ <40% ID	Recall @ 5% FPR
Contrastive + PGD Adversarial	0.958	72.6%	76.2%
Contrastive Only (Best Ablation)	0.980	96.2%	95.0%
No Contrastive (Adversarial Only)	0.848	48.7%	56.1%
Frozen ESM-2 + Linear Head	0.781	25.7%	31.8%

⚠️ Key Finding: Adversarial Training in Embedding Space Hurts

PGD adversarial training degrades detection from 96.2% to 72.6%. Representation-space perturbations are poor proxies for actual AI-designed biological variants, which navigate the structured manifold of functional proteins. This is a cautionary finding for the field: naive adversarial robustness techniques from computer vision do not transfer directly to biosecurity screening.

Mechanism Classification (Production Model)

Metric	Value
Overall Accuracy	90.8%
Macro F1	0.891
Weighted F1	0.907
Number of Classes	8 (7 threats + benign)
Training Sequences	4,981
Adversarial Variants	2,500

📈 Figures

Figure 1. ROC curves and detection rates by sequence identity. BioScreen maintains >95% detection where baselines collapse.

Figure 2. UMAP of embedding spaces. Functional embeddings (right) cluster adversarial variants with their parent threats.

Figure 3. Certified robustness via randomized smoothing. 100% of threats receive certified predictions across all noise levels.

Figure 4. Throughput on NVIDIA H200. 44.4 seq/s — 370× headroom above real-time screening requirements.

🏗️ Architecture


Input Protein Sequence
        │
        ▼
┌───────────────────┐
│    ESM-2 3B       │  2.84B params, 478M trainable
│    (30/36 frozen) │  fp16 autocast
└────────┬──────────┘
         ▼
┌───────────────────┐
│   Functional      │  Supervised contrastive learning
│   Embedding       │  L2-normalized, 256-dim
│   + Residual MLP  │  Clusters by mechanism
└────────┬──────────┘
         │
    ┌────┼────────────┬─────────┐
    ▼    ▼            ▼         ▼
  Binary  Mechanism    Risk       Contrastive
  Head    Classifier   Scorer     Projection
    │       │            │         │
    ▼       ▼            ▼         ▼
  PASS/   7 threats   [0,1]     128-dim
  REVIEW/ + benign    ± σ     (train only)
  BLOCK

🔬 Interactive Demo

Enter a protein sequence to screen. This runs a deterministic client-side heuristic for demonstration purposes. For production screening, deploy the API server with the trained ESM-2 model.

🚀 Quick Start

Python API

from bioscreen import BioScreener

screener = BioScreener.from_pretrained("bioscreen-v1")
results  = screener.screen(["MKFLVLLF..."])

for r in results:
    print(f"{r.decision}: {r.predicted_mechanism}")
    print(f"  Risk: {r.risk_score:.2f} ± {r.risk_uncertainty:.2f}")

REST API

# Start server
docker compose -f docker/docker-compose.yml up

# Screen a sequence
curl -X POST http://localhost:8080/v1/screen \
  -H "Content-Type: application/json" \
  -d '{"sequences": ["MKFLVLLF..."]}'

Install from Source

git clone https://github.com/YOUR_USERNAME/bioscreen.git
cd bioscreen
pip install -e .

🔗 Integration with Existing Infrastructure

SecureDNA Integration

BioScreen runs as a second-stage filter. Sequences passing SecureDNA's homology screening are re-evaluated by the functional embedding model.

# Pipeline: SecureDNA → BioScreen
if securedna.passes(seq):
    result = bioscreen.screen(seq)
    if result.decision == "BLOCK":
        flag_for_review(seq)

IBBIS Common Mechanism

Complementary to IBBIS HMM-based screening. BioScreen excels at detecting novel functional variants below 30% sequence identity.

# Parallel screening
ibbis_result = commec.screen(seq)
bio_result   = bioscreen.screen(seq)

# Flag if either detects
if ibbis_result.hit or \
   bio_result.decision != "PASS":
    escalate(seq)

💰 Coefficient Giving RFP Alignment

Maps to multiple priority areas in Coefficient Giving's biosecurity RFP (closes May 11, 2026):

RFP Priority	BioScreen Contribution
AI-accelerated biosecurity defenses	Function-aware screening that catches AI-designed evasion variants
LLM classifiers for biology misuse	Multi-task mechanism-of-harm classifier (8 classes, 90.8% accuracy)
Resilient AI safeguards	Certified robustness via randomized smoothing
Open-weight catch-up analysis	Adversarial variant generation benchmarking evasion rates

Get Started

BioScreen is open source (MIT License) and designed for drop-in integration with existing screening infrastructure.

⭐ GitHub Repository 🤗 HuggingFace Demo

🧬🛡️ BioScreen