Skip to content

Multimodal Chat & Image Summarization using Google’s Gemini models. Streamlit-based UI for generating responses from text prompts and uploaded images.

Notifications You must be signed in to change notification settings

hs094/Image-Summarization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📌 Image-Summarization

🔥 Multimodal Content Generation using Google’s Gemini Models

This project is a Streamlit-based web application that enables chat-based text generation and image summarization using Google’s Gemini models (gemini-1.5-flash for vision tasks and gemini-pro for text generation).


🚀 Features

✅ Chat & Image Summarization – Generate responses based on text prompts and images. ✅ Text-to-Image – Expandable multimodal capability (future implementation). ✅ Real-time Streaming – Get responses word-by-word using a simulated streaming effect. ✅ User-friendly Interface – Simple and interactive UI with image upload support.


🖼️ Usage

  1. Upload an image (optional).
  2. Enter a text prompt in the chatbox.
  3. The model will process the input and generate text responses.
  4. If an image is uploaded, the model will generate a multimodal response combining vision and text.

📌 Technologies Used

  • Streamlit – Interactive Web UI
  • Google Generative AI – gemini-1.5-flash, gemini-pro
  • PIL (Pillow) – Image Processing
  • dotenv – Environment Variable Management

📄 To-Do List

🔹 Enhance Text-to-Image (future expansion). 🔹 Improve Chat History Persistence. 🔹 Add More Model Options for users.


🤝 Contributing

Contributions are welcome! If you’d like to improve this project, feel free to:

  1. Fork the repo.
  2. Create a new branch (feature-branch).
  3. Commit your changes.
  4. Open a Pull Request.

📜 License

This project is open-source and available under the MIT License.


🌟 Acknowledgments

Thanks to Google AI for providing Gemini models and Streamlit for making UI development seamless.


About

Multimodal Chat & Image Summarization using Google’s Gemini models. Streamlit-based UI for generating responses from text prompts and uploaded images.

Topics

Resources

Stars

Watchers

Forks

Languages