Subramanyam Sahoo

Subramanyam Sahoo

Independent AI Safety Researcher

Cuttack, Odisha, India  ·  sahoo2vec@gmail.com

AI Safety Researcher specializing in alignment science and governance, with 2.5+ years of academic research experience and a proven publication record. I work on building AI systems that remain reliably aligned under adversarial conditions — through mechanistic interpretability, adversarial self-play, and governance frameworks that address institutional failure, not just model failure. NIT Hamirpur gold medalist. MARS 4.0 fellow, Cambridge AI Safety Hub.

20+
Publications
7
Hackathons
$16.5K
Research Funding
Gold
NIT Hamirpur Medal
Fellowships & Positions
Apr 2026
CORDA Democracy Fellowship — Open Democracy Institute
Ongoing research on Integrity Disclosures for Generative AI in Democratic Information Environments
Feb 2025–
AI Policy Fellow (Remote) — UC Berkeley (BASIS Fellowship)
Conducting research on governance aspects for the Berkeley AI Safety Initiative
Dec 2025–Feb 2026
MARS 4.0 — Cambridge AI Safety Hub
Mentorship for Alignment Research Students · Submitted 1 paper to RLC 2026 on RL Agents
Aug–Dec 2025
E-SOAR — EleutherAI
Summer of Open AI Research · Prompt Optimization for Verifiable Hallucination Reduction
Apr 2025–
Independent Contractor — Outlier AI
Designing synthetic datasets for RL-style post-training and evaluation under controlled task distributions
Summer 2025
Harvard Technical AI Safety & Harvard AI Policy Fellowships
Dual fellowships awarded for Summer 2025
Jul–Oct 2025
Mentor (Remote) — Paragon Policy Fellowship
AI Policy and Technical AI Governance (TAIG) research
Research Funding
AIM Intelligence AI Safety Compute Grant
South Korea · Apr–Jun 2026 · PI
USD 10,000
Martian — Research Grant
Nov 2025–Feb 2026 · PI · Mechanistic Interpretability
USD 6,000
Apart Research — Research Grant
Oct 2025 · Pilot experiments and preliminary analyses
USD 500
Recent
  • ICLR 2026 — AI for Peace Workshop — Oral Presentation · Dial E for Ethical Enforcement
  • Accepted: AIM Intelligence AI Safety Compute Grant, South Korea — PI (USD 10,000)
  • Harvard Technical AI Safety & Harvard AI Policy Fellowships — awarded Summer 2025
  • Apart Lab Studio Internship — accepted following Martian Mechanistic Interpretability Hackathon project
  • CBRN AI Risk Research Sprint — 3rd Prize · Molecules Under Watch
  • Featured in BloombergIntelligence Symbiosis Manifesto signatory
  • Y Combinator Startup School 2026 — accepted, Bangalore, India (April 18, 2026)
Under Review — 2026
ICML 2026
Subramanyam Sahoo
Feb 2026
Accepted — ICLR 2026
ICLR 2026
Subramanyam Sahoo
AI for Peace Workshop · Feb 2026
ICLR 2026
Subramanyam Sahoo
Post-AGI Science and Society Workshop · Feb 2026
ICLR 2026
Subramanyam Sahoo
Agents in the Wild: Safety, Security, and Beyond · Feb 2026
ICLR 2026
Subramanyam Sahoo
AI with Recursive Self-Improvement Workshop · Feb 2026
ICLR 2026
Subramanyam Sahoo
I Can't Believe It's Not Better Workshop · Feb 2026
ICLR 2026
Subramanyam Sahoo
Latent & Implicit Thinking Workshop · Feb 2026
Accepted — AAAI & NeurIPS 2025–2026
AAAI 2026
Subramanyam Sahoo
Logical and Symbolic Reasoning in Language Models · Nov 2025
NeurIPS 2025
Subramanyam Sahoo
Embodied and Safe-Assured Robotic Systems Workshop · Nov 2025
NeurIPS 2025
Subramanyam Sahoo
Socially Responsible and Trustworthy Foundation Models · Nov 2025
NeurIPS 2025
Subramanyam Sahoo
Algorithmic Collective Action Workshop · Sep 2025
NeurIPS 2025
Subramanyam Sahoo
ARLET Workshop · Sep 2025
ICVGIP 2025
Subramanyam Sahoo
Indian Conference on Computer Vision, Graphics and Image Processing · Oct 2025
Thesis
M.Tech 2024
NIT Hamirpur · Under Dr. Kamlesh Dutta
XAI and TreeSHAP · O(n²) → O(n log n) · EEG datasets · Computational Neuroscience
B.Tech 2020
Parala Maharaja Engineering College · Under Dr. Debasis Mohapatra
KNN, Decision Trees, SVMs · Competitive accuracy
Open-Source Projects
GitHub
Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
Python · PyTorch · FOMAML meta-learning · Differentiable mortality operator · EWC baselines
Framework
AdverSplay-GRPO
Adversarial self-play framework for sycophancy reduction in LLMs
Dual LoRA adapters · Frozen Qwen3-32B base · Group Relative Policy Optimisation · Lambda Labs GH200 / A100 · 89.3% sycophancy reduction
HuggingFace
Model checkpoints, datasets, and evaluation artifacts
Alignment experiments · LoRA adapters · Reproducible benchmarks
Apart Research
Geometric Fingerprints of Deceptive Alignment in Code Language Models
AI Control Hackathon 2026 · Deceptive alignment detection · Backdoored LLMs
Apart Research
Task-specific vulnerabilities and exploitable failure modes
Apart x Martian Hackathon · Led to Apart Lab Studio internship acceptance
Technologies
  • Programming & ML: Python, JAX, PyTorch, NumPy, Pandas, scikit-learn, OpenAI Gym
  • Systems & Tools: CUDA, Docker, Git, LaTeX, VS Code
  • Compute: Lambda Labs GH200 and A100, Modal cloud compute
  • Models: Qwen3-32B, Qwen3-14B, GPT-2 Large, Llama-2-13B, RoBERTa
Research Hackathons & Competitions
Mar 2026
AI Control Hackathon 2026 — Apart Research
Remote · Led to Apart Lab Studio internship acceptance
Feb 2026
The Technical AI Governance Challenge — Apart Research
Remote
Jan 2026
AI Manipulation Hackathon — Apart Research
Remote
Nov 2025
Defensive Acceleration Hackathon — Apart Research
Remote · Honeypots, Sparse Autoencoders, Adversarial Probes
Nov 2025
The AI Forecasting Hackathon — Apart Research
Remote
Sep 2025
CBRN AI Risks Research Sprint — Apart Research
Remote · Biosecurity · Multi-modal AI
Jun 2025
Apart x Martian Mechanistic Router Interpretability Hackathon
Remote · Led to USD 6,000 Martian research grant and Apart Lab Studio internship
Invited Talks & Presentations
Feb 2026
Dial E for Ethical Enforcement: Institutional Veto Power as a Governance Primitive
Oct 2024
Odisha AI Conference 2024 — Invited Speaker on AI Safety and Alignment
Virtual event hosted in the USA · 5 October 2024
2024
ACM India Summer School — Interpretable AI, IIT Madras
Led group project on mechanistic interpretability · Responsible and Safe AI track · Selected from 2,000+ applicants
2024
Live Q&A with Prof. David Krueger — University of Cambridge
Presented 2 questions on AI safety in public session
2024
Live Q&A with Dr. Sudarsan Padmanabhan — IIT Madras
Question on AI alignment and governance
2023
Orientation for B.Tech AI/ML Students — NIT Kurukshetra
Invited talk on AI safety and research directions
2023
Seminar on Recent Trends in AI — Dept. of CSE, NIT Hamirpur
LLM advances and safety implications
Collaborators
Sussex
Prof. Fernando Rosas — University of Sussex
Belief geometry in deep RL · MARS 4.0, Cambridge AI Safety Hub · RLC 2026 joint submission
Industry
Amirali Abdullah — Thoughtworks
Mechanistic interpretability · Specification gaming · ACL 2026 joint submission
EleutherAI
E-SOAR Research Mentee — EleutherAI
Open-source alignment research · Aug–Dec 2025
Coalition
EVALEVAL Coalition
Invited by Irene Solaiman (Hugging Face) and Anka Reuel (Stanford) · Science of evaluations
Apart
Accepted following Martian Mechanistic Interpretability Hackathon project
Camp
AI Safety Camp 2026 — 11th Edition
AI Control track · Jan–Apr 2026 · Invited by Justin Shenk
Open to Collaboration

Seeking research contractor roles and collaborations in:

LLM Post-training Reinforcement Learning Mechanistic Interpretability AI Control Adversarial Robustness Cooperative AI Science of Evaluations AI Policy & Governance Autonomous AI Agents
Teaching Assistant & Lab Supervisor — NIT Hamirpur (Aug 2022 – Jul 2024)
Sem 4
CS-661 Deep Learning — Teaching Assistant & Project Supervisor
Dual Degree CS · 4th Year · 8th Semester
Sem 4
CS-664 Deep Learning & Data Analytics Lab
Lab management and supervision
Sem 4
CS-326 Computer Networks Lab — Teaching & Lab Assistant
B.Tech CS · 3rd Year · 6th Semester
Sem 4
CS-429 Major Project Stage 2 — Assistant Supervisor
B.Tech CS · 4th Year · Guided 10+ students
Sem 3
CS-652 Machine Learning — Teaching Assistant & Project Supervisor
Dual Degree CS · 4th Year · 7th Semester
Sem 3
CS-651 Artificial Intelligence — Curriculum & Assessment Development
Curriculum content and assessment materials
Sem 3
CS-315 Database Management Systems Lab — Teaching & Lab Assistant
B.Tech CS · 3rd Year · 5th Semester
Sem 3
CS-419 Major Project Stage 1 — Assistant Supervisor
B.Tech CS · 4th Year
Sem 2
CS-101 Computer Programming — Teaching Assistant
B.Tech EE · 1st Year · 2nd Semester
Sem 1
CS-102 Computer Programming Lab — Teaching & Lab Assistant
B.Tech CS · 1st Year · 1st Semester
2023
Organized AMRIT-2023 and MINDS-2023 Conferences — NIT Hamirpur
Core organizing committee · Mentored students on project planning and career development
Certificates
Mar–May 2026
Cooperative AI Fundamentals — Cooperative AI Foundation
Jan–Feb 2026
Technical AI Safety Fundamentals — BlueDot Impact
Nov–Dec 2025
Biosecurity Fundamentals — BlueDot Impact
May–Sep 2025
AI Agents and Law — Vista Institute for AI Policy
Jan–May 2025
Advanced Large Language Model Agents (MOOC) — UC Berkeley, Google DeepMind
Feb–May 2025
AI Safety, Ethics, and Society — Center for AI Safety (CAIS)
Peer Review
  • ACL 2026 — EvalEval Workshop — Reviewer
  • ACL 2026 — TrustNLP Workshop — Reviewer
  • ICLR 2026 — AIWILD Workshop — Reviewer
  • ICLR 2026 — P-AGI Workshop — Reviewer
  • ICLR 2026 — RSI Workshop — Reviewer
  • ICLR 2026 — SPOT Workshop — Reviewer
  • ICLR 2026 — Sci4DL Workshop — Reviewer
  • ICML 2026 — EIML Workshop — Reviewer
  • ICML 2026 — TAIGR Workshop — Reviewer
  • COLM 2026 Conference — Reviewer
  • EVALEVAL Coalition — Science of Evaluations — Active Member · Invited by Irene Solaiman (HuggingFace) & Anka Reuel (Stanford)
Research Funding
AIM Intelligence AI Safety Compute Grant
South Korea · Apr–Jun 2026 · PI · Long horizon stability & Regression risk
USD 10,000
Martian — Research Grant
Nov 2025–Feb 2026 · PI · Mechanistic interpretability, model analysis, dissemination
USD 6,000
Apart Research — Research Grant
Oct 2025 · Pilot experiments and preliminary analyses
USD 500
Volunteer
Jan–Apr 2026
AI Safety Camp — 11th Edition
AI Control track
Jul–Oct 2025
Mentor — Paragon AI Policy Fellowship
AI Policy and Technical AI Governance
May–Jun 2025
Mentor — LatinX AI Club, 2025 Edition
Media
Academic Achievements
  • Gold Medalist — M.Tech CSE (AI), NIT Hamirpur (Oct 2024) · Summa Cum Laude · Batch Topper · CGPA 9.38/10
  • B.Tech with Honours — Computer Science and Engineering · Ranked TOP-5 · CGPA 8.67/10
  • Harvard Technical AI Safety Fellowship — awarded Summer 2025
  • Harvard AI Policy Fellowship — awarded Summer 2025
  • Berkeley AI Safety Initiative (BASIS) Fellowship — awarded for AI Governance research
  • ICLR 2026 — Oral Presentation, AI for Peace Workshop
  • CBRN AI Risk Research Sprint — 3rd Prize, Apart Research (Sep 2025)
  • Apart Lab Studio Internship — accepted, following Martian Hackathon project
  • Y Combinator Startup School 2026 — accepted, Bangalore, India (April 18, 2026)
  • Full fee waiver — "Harms and Risks of AI in the Military" workshop, Mila - Quebec AI Institute, Montreal, Canada (2024)
  • Climate Change AI Summer School — Mila, Quebec (2024)
  • ACM India Summer School 2024 — "Responsible and Safe AI" at IIT Madras, selected from 2,000+ applicants
  • ACM India Summer School 2024 — "Generative AI for Text" at IIT Gandhinagar, selected from 1,700+ applicants
Notable Interactions
  • Discussed "Weak to Strong Generalization" with Stephen Casper — Algorithmic Alignment Group, MIT EECS (IIT Madras)
  • Live Q&A with Prof. David Krueger, University of Cambridge — 2 questions on AI safety
  • Live Q&A with Prof. Mausam, IIT Delhi — IIT Gandhinagar
  • Live Q&A with Dr. Sudarsan Padmanabhan — IIT Madras
Languages & Interests
  • English — Full professional proficiency
  • Odia — Native proficiency
  • Hindi — Limited working proficiency
  • Sanskrit — Limited working proficiency
  • Research interests: Large Language Models, AI Governance, AI Safety & Alignment
  • Hobby: Critically acclaimed podcasts