Gowthami Somepalli

Multimodal AI researcher obsessed with how machines perceive, remember, and generate the world. Based in Mountain View, CA.

PhD from UMD focused on diffusion model memorization — built memorization evals for diffusion models and CSD, a widely-used style similarity metric. Also built evals for video understanding: CinePile (long-video QA benchmark, Best Paper at CVPR 2024 SynCV) and ARGUS (hallucination/omission eval for dense captions). (Friends call me the "Evals Shill" for a reason.)

Before academia: did SGD in industry for a while in India, IIT Madras alum, founded a Fashion AI startup that was way too early to the party.

Open to collabs on generative modeling (evals + post-training). Hit me up: gowthami [dot] somepalli [at] gmail.com

// featured writing

BLOG POST · PART 1

Latent Scaffolding: Z-Image Is Secretly an I2I Model

A simple architectural splice unlocks zero-shot image-to-image variations with no training.

BLOG POST · PART 2

Latent Scaffolding: Token Dropout for Diverse Image Variations

Vision-only token dropout solves mode collapse. Hunting attention sinks and finding two orthogonal knobs for diversity.

// papers

2025

ARGUS: A Hallucination and Omission Evaluation Benchmark for Dense Video Captioning ECCVpaper · dataset · code

2024

Calvin: Improved Contextual Video Captioning via Instruction Tuning NeurIPSpaper

2024

CinePile: A Long Video Question Answering Dataset and Benchmark CVPR Workshop · Best Paperpaper · dataset

2024

Measuring Style Similarity in Diffusion Models ECCVpaper · project · code

2023

Understanding and Mitigating Copying in Diffusion Models NeurIPSpaper · code · talk

2023

Diffusion Art or Digital Forgery? Investigating Data Replication CVPRpaper · project · code · press

2022

Can Neural Nets Learn the Same Model Twice? CVPR · Oralpaper · project · code · video

2022

SAINT: Neural Networks for Tabular Data NeurIPS TRLWpaper · project · code

2021

PatchGame: Learning to Signal Mid-level Patches NeurIPSpaper · code · video

all papers →