What I'm building
Open-source projects spanning multimodal LLMs, model serving, computer vision, and AI tooling — mostly built with LitServe and PyTorch.
Chatterbox TTS API
Production-ready Text-to-Speech API built on Resemble AI's Chatterbox model with LitServe. Supports zero-shot voice cloning, emotion intensity control, and base64 audio I/O.
RF-DETR Object Detection API
Real-time object detection API using RF-DETR, a SOTA transformer-based model, deployed with LitServe. End-to-end inference with no region proposals or anchor boxes required.
LitServe Examples
A curated collection of production-grade AI serving examples built on LitServe — Lightning AI's high-performance inference engine. Covers speech, vision, LLMs, embeddings, and object detection.
Chat with Llama 3.2 Vision
Deploy Meta's Llama 3.2 Vision multimodal LLM with LitServe for lightning-fast inference. Supports image understanding and visual question answering via a clean REST API.
Chat with Qwen2-VL
Deploy and chat with Alibaba's Qwen2-VL multimodal large language model using LitServe. Supports image understanding, document parsing, and visual reasoning tasks.
Chat with MiniCPM-V 2.6
Deploy MiniCPM-V 2.6 — a GPT-4V level multimodal LLM designed for edge devices — using LitServe. Handles single image, multi-image, and video inputs.
Chat with Phi 3.5 Vision
Deploy and chat with Microsoft's Phi 3.5-vision multimodal LLM. LitServe handles high-performance inference while a Streamlit frontend gives you multi-image chat, comparison, and video summarization.
Receipt OCR Engine
An efficient open-source OCR engine for receipt image processing. Combines Tesseract OCR for raw text extraction with LLM-powered structured data parsing — available as a CLI tool and FastAPI service.
3D Lung Tumour Segmentation
3D semantic segmentation of lung tumours from CT scans using PyTorch Lightning and MONAI. Trained on the Medical Segmentation Decathlon lung dataset with a U-Net based architecture.