All projects
LitServe
Multimodal
LLM
Streamlit
Chat with Phi 3.5 Vision
Overview
Phi-3.5-vision is Microsoft's lightweight state-of-the-art open multimodal model, capable of multi-frame image understanding, image comparison, and video summarization.
This project wraps it in a production-ready stack:
- LitServe for fast, scalable inference serving
- Streamlit for an interactive chat UI
- Flash Attention for optimized GPU throughput
Stack
microsoft/Phi-3.5-vision-instructvia HuggingFace Transformers- LitServe inference server
- Streamlit chat interface
Get Started
pip install -r requirements.txt
python server.py # start LitServe API
streamlit run app.py # launch UI