We build foundation models that understand, reason, and create.
StarVox Labs is an AI-native research company pushing the boundaries of language, vision, and multimodal intelligence.
What is StarVox Labs?
India's first AI-native research lab building foundation models from the ground up. We're a team of researchers and engineers who grew up training models — not adapting legacy systems.
Our work spans language, vision, speech, and safety — always with an emphasis on publishing openly, building responsibly, and shipping systems that work in production.
Six verticals. One mission — push what AI can do.
Large Language Models
Novel architectures and training paradigms that push the boundaries of natural language understanding. Our work on efficient attention mechanisms reduces compute by 3x without sacrificing benchmark performance.
Computer Vision
Self-supervised visual representation learning and next-generation perception systems. From object detection to scene understanding, we build models that see the world with clarity.
Multimodal Intelligence
Unified models that reason across language, vision, and audio. We're bridging the gap between modalities to create AI that understands context the way humans do.
Reinforcement Learning
Training agents that learn optimal strategies through interaction — from simulated environments to real-world robotics and autonomous decision-making systems.
AI Safety & Alignment
Ensuring AI systems remain beneficial, interpretable, and aligned with human values. We publish formal frameworks for bounded autonomy and interpretability.
Speech & Audio
State-of-the-art models for speech recognition, synthesis, and audio understanding. Real-time transcription and natural voice generation across 40+ languages.
From research to production.
VoxLM-7B
ReleasedOur flagship 7-billion parameter language model. Optimized for reasoning and instruction following with industry-leading efficiency on standard benchmarks.
VoxVision-3B
BetaA compact yet powerful vision-language model capable of understanding complex visual scenes, chart interpretation, and visual question answering.
VoxAudio-1B
In ResearchNext-generation speech model with real-time transcription, translation, and natural voice synthesis. Targeting sub-200ms latency for production deployments.
VoxAgent-12B
In ResearchAn agentic model designed for autonomous task completion, tool use, and multi-step reasoning. Built to operate safely in open-ended environments.
Built by people who live and breathe AI.
Dr. Arjun Mehta
Chief Research Officer
15+ years in deep learning. Former lead at a top-3 AI lab. Published 80+ papers on NLP and representation learning.
Priya Sharma
Head of Engineering
Systems architect specializing in distributed training infrastructure. Scaled systems to 10,000+ GPU clusters.
Dr. Kai Chen
Research Lead — Vision
Pioneer in self-supervised visual learning. 50+ publications. Co-author of two widely-adopted open-source vision libraries.
Nadia Volkov
Research Lead — Safety
AI governance and alignment researcher. Previously built interpretability tools used by 500+ research teams worldwide.
We publish openly. Always.
Scaling Laws for Efficient Language Model Pre-training
Novel scaling laws that enable training language models with 3x fewer compute resources while maintaining benchmark performance across reasoning and generation tasks.
VoxLM: A New Architecture for Instruction-Following Models
Introducing VoxLM — our transformer variant achieving state-of-the-art results on reasoning benchmarks with 40% improved inference throughput.
Towards Safe Agentic AI: A Framework for Bounded Autonomy
A formal framework for constraining AI agent behavior while preserving task completion capabilities in complex, open-ended environments.