AI infrastructure, safety systems, and applied research — from KV-cache management for long-context inference, to runtime safety layers for agent tool invocations, to state-of-the-art models for vision and retrieval.
A KV-cache management layer that clusters semantically related tokens and integrates with vLLM's PagedAttention — enabling 128K-context inference on a single 24GB RTX 4090 with <3% accuracy loss.
Read MoreAn autoscaler for vLLM fleets that estimates each request's token budget and dispatches to appropriately sized pools — eliminating OOM preemptions and cutting GPU cost by 61%.
Read MoreA low-latency MCP middleware that scores every tool invocation against user request + runtime context, blocking policy-violating calls at p99 <8ms.
Read MoreBuilding tools to inspect how language models construct internal world representations. Current research at Elemental Research Lab investigating mechanistic interpretability of world models.
Read MoreState-of-the-art 99.21% accuracy on Kennedy Space Center hyperspectral data, outperforming 10+ baselines. 12+ citations. Top 3% on the HSI leaderboard.
Read MoreEnd-to-end RAG pipeline achieving sub-second query response at 99.1% availability with 500+ concurrent queries. Reduced hallucination by 35%.
Read More
YRG
Hi, I'm Yugandhar! I'm a Master's student in Computer Science at the University of Southern California, and a research member at Elemental Research Lab where I'm building World Model Lens — tools for inspecting how language models construct internal world representations.
I see the main goal of my work as understanding how neural networks represent and process information internally, and using that understanding to make AI systems safer and more reliable. My core research interests are mechanistic interpretability, world models, and AI safety.
I also build high-performance AI infrastructure — from KV-cache management systems that enable long-context inference on consumer GPUs, to token-budget-aware routing gateways for production LLM fleets, to runtime safety layers for agent tool invocations.
Before USC, I graduated top 2% from SRM Institute of Science & Technology with a B.Tech in Computer Science (AI & ML specialization) with a perfect 4.0 GPA. I've published research across NLP, computer vision, and document retrieval — 78 citations, h-index of 5. You can see my papers here.
Previously, I worked as an AI Research Intern at the University of St Andrews (Transformer architectures for historical image classification) and as a Computer Vision Research Intern at Trinity College Dublin (maritime detection systems processing 10TB+ satellite imagery). I also worked as a Software Engineer at HydroMind, building developer tools and scaling testing infrastructure.
I'm currently focused on mechanistic interpretability research — specifically on understanding how language models build internal world representations and what tools we need to inspect them. I'm also building open-source AI infrastructure: IceCache (semantic KV-cache paging for vLLM), a token-budget-aware routing gateway, and STARS (a runtime safety firewall for MCP tool invocations).
I'm targeting AI Engineer and AI Researcher roles. If you're working on interpretability, AI safety, or high-performance ML systems, I'd love to connect — reach out.
I'm targeting AI Engineer and AI Researcher roles. gogiredd@usc.edu
Mechanistic interpretability of language models and parameter-efficient deep learning across NLP and computer vision.
Investigating how language models build internal world representations. Building inspection tools for mechanistic interpretability of world models — understanding what concepts LLMs encode, how they compose them, and how to make this process transparent.
Read MoreNovel Transformer architecture for hyperspectral image classification achieving state-of-the-art 99.21% accuracy. Reduced parameters by 30% (2.1M to 1.47M) while maintaining accuracy, enabling edge deployment.
Read MoreCombined graph-based extractive methods with Transformer fusion for text summarization, demonstrating synergistic improvements over standalone approaches.
Read MoreSystematic exploration of parameter-efficient methods for deploying NLP models in resource-constrained settings — reducing compute requirements while preserving performance.
Read MoreSeven peer-reviewed papers and preprints across NLP, computer vision, and document retrieval — including parameter-efficient deep learning, hyperspectral image classification, and long-document language modeling.
A hybrid CNN-LSTM architecture for solving variable-length character CAPTCHAs — the CNN extracts local visual features, the LSTM models sequential character dependencies.
Read MoreA hybrid framework pairing graph-based extractive sentence selection with Transformer-based abstractive fusion. Synergistic gains over either approach standalone.
Read MoreA retrieval pipeline using Sentence Transformers and distributed FAISS over large PDF corpora, with a chunking strategy designed to preserve semantic coherence.
Read MoreA systematic study of LoRA, adapters, prefix tuning, distillation, and quantization for deploying NLP models under tight compute budgets.
Read MoreA two-stage pipeline combining Super-Resolution CNNs with multi-scale Retinex defogging for underwater image enhancement — clearer imagery for marine vision tasks.
Read MoreA reproducibility study comparing extractive, abstractive, and hybrid dialogue summarization methods on SAMSum and DialogSum, with proposed methodological refinements.
Read MoreEmpirically diagnoses and mitigates the "lost-in-the-middle" failure mode — LLMs underweighting information located in the center of long documents.
Read MoreResearch, internships, industry, and education.