This project is a Streamlit-based web application that enables chat-based text generation and image summarization using Google’s Gemini models (gemini-1.5-flash for vision tasks and gemini-pro for text generation).
✅ Chat & Image Summarization – Generate responses based on text prompts and images. ✅ Text-to-Image – Expandable multimodal capability (future implementation). ✅ Real-time Streaming – Get responses word-by-word using a simulated streaming effect. ✅ User-friendly Interface – Simple and interactive UI with image upload support.
- Upload an image (optional).
- Enter a text prompt in the chatbox.
- The model will process the input and generate text responses.
- If an image is uploaded, the model will generate a multimodal response combining vision and text.
- Streamlit – Interactive Web UI
- Google Generative AI – gemini-1.5-flash, gemini-pro
- PIL (Pillow) – Image Processing
- dotenv – Environment Variable Management
📄 To-Do List
🔹 Enhance Text-to-Image (future expansion). 🔹 Improve Chat History Persistence. 🔹 Add More Model Options for users.
Contributions are welcome! If you’d like to improve this project, feel free to:
- Fork the repo.
- Create a new branch (feature-branch).
- Commit your changes.
- Open a Pull Request.
This project is open-source and available under the MIT License.
Thanks to Google AI for providing Gemini models and Streamlit for making UI development seamless.