Yugandhar Reddy G
Selected Work · 2024 — 2025

Projects

AI infrastructure, safety systems, and applied research — from KV-cache management for long-context inference, to runtime safety layers for agent tool invocations, to state-of-the-art models for vision and retrieval.

2025 · Systems

IceCache: Semantic Paging for 1M-Token Context

A KV-cache management layer that clusters semantically related tokens and integrates with vLLM's PagedAttention — enabling 128K-context inference on a single 24GB RTX 4090 with <3% accuracy loss.

Read More
2025 · Infrastructure

Token-Budget-Aware Pool Router

An autoscaler for vLLM fleets that estimates each request's token budget and dispatches to appropriately sized pools — eliminating OOM preemptions and cutting GPU cost by 61%.

Read More
2025 · Safety

STARS: Runtime Skill Firewall for Agents

A low-latency MCP middleware that scores every tool invocation against user request + runtime context, blocking policy-violating calls at p99 <8ms.

Read More
2025 · Interpretability

World Model Lens

Building tools to inspect how language models construct internal world representations. Current research at Elemental Research Lab investigating mechanistic interpretability of world models.

Read More
2024 · Computer Vision

Pyramid Hierarchical Spatial-Spectral Transformer

State-of-the-art 99.21% accuracy on Kennedy Space Center hyperspectral data, outperforming 10+ baselines. 12+ citations. Top 3% on the HSI leaderboard.

Read More
2024 · NLP / RAG

High-Performance PDF Retrieval with RAG

End-to-end RAG pipeline achieving sub-second query response at 99.1% availability with 500+ concurrent queries. Reduced hallucination by 35%.

Read More
Yugandhar Reddy G YRG
Mechanistic Interpretability World Models AI Safety
Location Los Angeles, CA

About Me

Hi, I'm Yugandhar! I'm a Master's student in Computer Science at the University of Southern California, and a research member at Elemental Research Lab where I'm building World Model Lens — tools for inspecting how language models construct internal world representations.

I see the main goal of my work as understanding how neural networks represent and process information internally, and using that understanding to make AI systems safer and more reliable. My core research interests are mechanistic interpretability, world models, and AI safety.

I also build high-performance AI infrastructure — from KV-cache management systems that enable long-context inference on consumer GPUs, to token-budget-aware routing gateways for production LLM fleets, to runtime safety layers for agent tool invocations.

Before USC, I graduated top 2% from SRM Institute of Science & Technology with a B.Tech in Computer Science (AI & ML specialization) with a perfect 4.0 GPA. I've published research across NLP, computer vision, and document retrieval — 78 citations, h-index of 5. You can see my papers here.

Previously, I worked as an AI Research Intern at the University of St Andrews (Transformer architectures for historical image classification) and as a Computer Vision Research Intern at Trinity College Dublin (maritime detection systems processing 10TB+ satellite imagery). I also worked as a Software Engineer at HydroMind, building developer tools and scaling testing infrastructure.

What I'm Working On Now

I'm currently focused on mechanistic interpretability research — specifically on understanding how language models build internal world representations and what tools we need to inspect them. I'm also building open-source AI infrastructure: IceCache (semantic KV-cache paging for vLLM), a token-budget-aware routing gateway, and STARS (a runtime safety firewall for MCP tool invocations).

I'm targeting AI Engineer and AI Researcher roles. If you're working on interpretability, AI safety, or high-performance ML systems, I'd love to connect — reach out.

Technical Expertise

AI / ML & Research
PyTorch · JAX · Hugging Face Transformers · FAISS · vLLM · ONNX Runtime · LangChain · RAG systems · fine-tuning · mechanistic interpretability tooling.
Computer Vision
OpenCV · YOLO · ResNet · HRNet · Vision Transformers · hyperspectral image classification.
Systems & Infrastructure
Rust · Go · CUDA / Triton kernels · Kubernetes · Terraform · Redis · Kafka · Prometheus · AWS · Docker · CI/CD.
Languages
Python (primary) · C++ · Rust · Go · SQL · JavaScript.
· · ·
Get in touch

Working on interpretability, safety, or ML systems? Let's talk.

I'm targeting AI Engineer and AI Researcher roles. gogiredd@usc.edu

Research

Areas & ongoing work

Mechanistic interpretability of language models and parameter-efficient deep learning across NLP and computer vision.

Current · Elemental Research Lab

World Model Lens

Investigating how language models build internal world representations. Building inspection tools for mechanistic interpretability of world models — understanding what concepts LLMs encode, how they compose them, and how to make this process transparent.

Read More
2024 · Published · 12+ citations

Pyramid Hierarchical Spatial-Spectral Transformer

Novel Transformer architecture for hyperspectral image classification achieving state-of-the-art 99.21% accuracy. Reduced parameters by 30% (2.1M to 1.47M) while maintaining accuracy, enabling edge deployment.

Read More
2024 · Published · 15 citations

Graph-Based Sentence Selection with Transformer Fusion for Text Summarization

Combined graph-based extractive methods with Transformer fusion for text summarization, demonstrating synergistic improvements over standalone approaches.

Read More
2024 · Published · 11 citations

Sustainable NLP: Parameter Efficiency for Resource-Constrained Environments

Systematic exploration of parameter-efficient methods for deploying NLP models in resource-constrained settings — reducing compute requirements while preserving performance.

Read More
Publications · 78 citations · h-index 5

Papers

Seven peer-reviewed papers and preprints across NLP, computer vision, and document retrieval — including parameter-efficient deep learning, hyperspectral image classification, and long-document language modeling.

2024 · Computer Vision · Cited 28

Efficient CAPTCHA Image Recognition Using CNNs and LSTM

A hybrid CNN-LSTM architecture for solving variable-length character CAPTCHAs — the CNN extracts local visual features, the LSTM models sequential character dependencies.

Read More
2024 · NLP · Cited 15

Graph-Based Sentence Selection with Transformer Fusion for Text Summarization

A hybrid framework pairing graph-based extractive sentence selection with Transformer-based abstractive fusion. Synergistic gains over either approach standalone.

Read More
2024 · NLP / RAG · Cited 13

End-to-End Neural Embedding Pipeline for Large-Scale PDF Retrieval

A retrieval pipeline using Sentence Transformers and distributed FAISS over large PDF corpora, with a chunking strategy designed to preserve semantic coherence.

Read More
2024 · NLP · Cited 11

Sustainable NLP: Parameter Efficiency for Resource-Constrained Environments

A systematic study of LoRA, adapters, prefix tuning, distillation, and quantization for deploying NLP models under tight compute budgets.

Read More
2024 · Computer Vision · Cited 9

Advanced Underwater Image Quality Enhancement

A two-stage pipeline combining Super-Resolution CNNs with multi-scale Retinex defogging for underwater image enhancement — clearer imagery for marine vision tasks.

Read More
2024 · NLP · Cited 2

Systematic Exploration of Dialogue Summarization Approaches

A reproducibility study comparing extractive, abstractive, and hybrid dialogue summarization methods on SAMSum and DialogSum, with proposed methodological refinements.

Read More
2025 · LLMs · Preprint

Never Lost in the Middle Again: Teaching LLMs to Care About the Center of Long Documents

Empirically diagnoses and mitigates the "lost-in-the-middle" failure mode — LLMs underweighting information located in the center of long documents.

Read More

View full profile on Google Scholar →

Career

Experience

Research, internships, industry, and education.

Current

Research Member
Mar 2026 — Present
Elemental Research Lab — New York
Building World Model Lens — tools for inspecting how language models construct internal world representations. Research in mechanistic interpretability and AI safety.
Graduate Student Researcher
Apr 2026 — Present
USC Viterbi School of Engineering — Los Angeles
Researcher at USC InfoLab on W4H (Wearables for Health), an NIH-funded initiative building an open-source toolkit that transforms wearable sensor data into clinical intelligence.

Research Internships

AI Research Intern
Jun 2024 — Oct 2024
University of St Andrews — Scotland
Pioneered Transformer architecture for historical image classification, achieving 18% accuracy improvement (78% → 94%) on 50,000+ museum artifacts. Constructed end-to-end deep learning pipeline integrating attention mechanisms with spatial feature fusion, enabling accurate classification of previously unidentifiable specimens.
Computer Vision Research Intern
Jul 2024 — Nov 2024
Trinity College Dublin — Ireland
Architected maritime detection system processing 10TB+ satellite imagery (HRNet + ResNet101), achieving 12% accuracy gain and 98.3% precision. Optimized inference by 30%, enabling real-time monitoring. Implemented advanced augmentation strategies reducing critical miss rates by 20% in adverse weather conditions.

Industry

Software Engineer Intern
Sep 2024 — Mar 2025
HydroMind — Chennai, India
Architected intelligent Command Palette (React + Node.js) achieving 40% developer velocity acceleration, eliminating 15+ weekly hours of manual work. Scaled testing infrastructure from 60% to 85% coverage (340 tests added), reducing post-deployment bugs by 45% and production incidents by 30%.

Education

M.S. Computer Science
Expected May 2027 · GPA 3.65/4.0
University of Southern California — Los Angeles
Coursework: Analysis of Algorithms, Artificial Intelligence, Advanced Machine Learning.
B.Tech Computer Science (AI & ML Specialization)
Sep 2021 — Jun 2025 · GPA 4.0/4.0 · Top 2%
SRM Institute of Science & Technology — Chennai
Research focus: Deep learning, NLP, and computer vision. Published 7 papers. Scaled 200+ member organization by 45% as President. Mentored 4 junior researchers in deep learning methodology.