Skip to main content

Analyze images with Google Gemini 1.5 vision and Vertex AI

In this project, we will discuss how to develop a Streamlit application that integrates Vertex AI’s Gemini Pro Vision model which aims at responding to user-inputted images and prompts. We will guide you throughout the code to show how exactly one can use the model to get relevant information from images.

Application FrontEnd
Application FrontEnd

Setting Up the Environment

First we have to install some libraries, in fact, Vertex API SDK and Streamlit. We can do this using the following commands:We can do this using the following commands:

pip install google-cloud-aiplatform
pip install streamlit

Initializing the Vertex AI SDK

Next, we need to initialize the Vertex AI SDK by providing our project ID and region:

PROJECT_ID = ” ” # Your Vertex AI project ID
REGION = ” ” # Region where your Vertex AI resources are located
# Initialize Vertex AI SDK
vertexai.init(project=PROJECT_ID, location=REGION)

Defining the generate_response Function

The generate_response function takes a prompt and an image file as input and returns a generated response based on the Gemini Pro Vision model:

def generate_response(prompt, image_file):
# Load the image from file
image = Image.load_from_file(image_file)
generative_multimodal_model = GenerativeModel(“gemini-1.0-pro-vision”)
response = generative_multimodal_model.generate_content([prompt, image])
return response.candidates[0].content.text

Defining the main Function

The main function is the entry point of our Streamlit application. It sets up the UI components, allows users to upload an image, and generates a response based on the user’s question:

def main():
st.title(“Vertex AI with Gemini Pro Vision”)
st.image(“logo.jpg”, width=100)
img = st.file_uploader(“Upload an image”)
if img:
# Create a temporary directory
temp_dir = tempfile.mkdtemp()
# Define the path to save the uploaded image
path = os.path.join(temp_dir, img.name)
# Write the uploaded image to the specified path
with open(path, “wb”) as f:
f.write(img.getvalue())
# Input area for user’s question
st.header(“:violet[Question]”)
question = st.text_area(label=“Enter your question”)
submit = st.button(“Submit”)
# If a question is entered and submitted
if question and submit:
# Generate a response based on the question and uploaded image
response = generate_response(question, path)
# Display the generated response
st.header(“Answer”)
st.write(response)

The above code is going to generate the following Frontend:

Application FrontEnd
Application FrontEnd

Running the Application

Finally, we can run the application using the following command:

Streamlit run app.py

This will open up the Streamlit application in our browser allowing us to upload an image and pose a question to in an attempt to get a response using the Gemini Pro Vision model.

Testing the Application

We can test the application with different use cases, such as:

1- Uploading an image and asking a question about the objects in the image

Testing- Input Image1
Testing- Input Image1
Testing- Q1
Testing- Q1

2- Uploading an image and asking a question about the scenario depicted in the image

Testing- Input Image2
Testing- Input Image2
Testing- Q2
Testing- Q2

3- Uploading an image and asking a question about the analysis of the image

We used the same image as above and asked the following question:

Testing- Q3
Testing- Q3

The application simply needs to respond to the user with information derived from the Vision model of the Gemini Pro, which can retrieve vital data from the image.

 

0
    0
    Your Cart
    Your cart is emptyReturn to Courses