All projects
LitServe
Multimodal
LLM
Streamlit

Chat with Phi 3.5 Vision

·
Chat with Phi 3.5 Vision

Overview

Phi-3.5-vision is Microsoft's lightweight state-of-the-art open multimodal model, capable of multi-frame image understanding, image comparison, and video summarization.

This project wraps it in a production-ready stack:

  • LitServe for fast, scalable inference serving
  • Streamlit for an interactive chat UI
  • Flash Attention for optimized GPU throughput

Stack

  • microsoft/Phi-3.5-vision-instruct via HuggingFace Transformers
  • LitServe inference server
  • Streamlit chat interface

Get Started

pip install -r requirements.txt
python server.py        # start LitServe API
streamlit run app.py   # launch UI