Analyze images with Google Gemini 1.5 vision and Vertex AI
In this project, we will discuss how to develop a Streamlit application that integrates Vertex AI’s Gemini Pro Vision model which aims at responding to user-inputted images and prompts. We will guide you throughout the code to show how exactly one can use the model to get relevant information from images.

Setting Up the Environment
First we have to install some libraries, in fact, Vertex API SDK and Streamlit. We can do this using the following commands:We can do this using the following commands:
pip install streamlit
Initializing the Vertex AI SDK
Next, we need to initialize the Vertex AI SDK by providing our project ID and region:
REGION = ” ” # Region where your Vertex AI resources are located# Initialize Vertex AI SDK
vertexai.init(project=PROJECT_ID, location=REGION)
Defining the generate_response Function
The generate_response function takes a prompt and an image file as input and returns a generated response based on the Gemini Pro Vision model:
# Load the image from file
image = Image.load_from_file(image_file)
generative_multimodal_model = GenerativeModel(“gemini-1.0-pro-vision”)
response = generative_multimodal_model.generate_content([prompt, image])
return response.candidates[0].content.text
Defining the main Function
The main function is the entry point of our Streamlit application. It sets up the UI components, allows users to upload an image, and generates a response based on the user’s question:
st.title(“Vertex AI with Gemini Pro Vision”)
st.image(“logo.jpg”, width=100)
img = st.file_uploader(“Upload an image”)
if img:
# Create a temporary directory
temp_dir = tempfile.mkdtemp()
# Define the path to save the uploaded image
path = os.path.join(temp_dir, img.name)
# Write the uploaded image to the specified path
with open(path, “wb”) as f:
f.write(img.getvalue()) # Input area for user’s question
st.header(“:violet[Question]”)
question = st.text_area(label=“Enter your question”)
submit = st.button(“Submit”)# If a question is entered and submitted
if question and submit:
# Generate a response based on the question and uploaded image
response = generate_response(question, path)
# Display the generated response
st.header(“Answer”)
st.write(response)
The above code is going to generate the following Frontend:

Running the Application
Finally, we can run the application using the following command:
This will open up the Streamlit application in our browser allowing us to upload an image and pose a question to in an attempt to get a response using the Gemini Pro Vision model.
Testing the Application
We can test the application with different use cases, such as:
1- Uploading an image and asking a question about the objects in the image


2- Uploading an image and asking a question about the scenario depicted in the image


3- Uploading an image and asking a question about the analysis of the image
We used the same image as above and asked the following question:

The application simply needs to respond to the user with information derived from the Vision model of the Gemini Pro, which can retrieve vital data from the image.