Lab2: AI Food Analysis App with Gemini

Introduction

In this lab, you will build an AI-powered food analysis application using Google Gemini and the Mini Pupper’s camera. The app captures images of food and uses Gemini’s vision capabilities to analyze nutritional content, estimate calories, and provide health recommendations.

Prerequisites

Completed Lab1 (Gemini API Setup)
Camera connected to Mini Pupper
Jupyter Notebook installed

Part 1: Quick Start - Test Gemini Vision

Before building the full app, let’s test Gemini Vision with a simple notebook.

Step 1: Clone Mangdang Repository

git clone http://github.com/lbaitemple/mangdang
cd mangdang/gemini

Step 2: Open food.ipynb

Open Jupyter Lab: http://<robot-ip>:8888
Navigate to mangdang/gemini/
Open food.ipynb

Step 3: Test Vision

Place a food item in front of the camera
Run the notebook cells
Modify the prompt to test different analyses

Part 2: Setup for Custom App

Install Required Libraries

Create a requirements.txt file:

google-generativeai
python-dotenv
opencv-python
ipywidgets
Pillow
langchain-google-vertexai

Install the dependencies:

pip install -r requirements.txt

Configure Credentials

Create a .env file with your API credentials:

# Copy the sample env file
cp env.sample .env

# Edit the .env file
nano .env

Add your API key path:

API_KEY_PATH=/path/to/your/credentials.json

Part 2: Understanding the Code

Load Credentials

from dotenv import load_dotenv
import os
import google_api
from PIL import Image

load_dotenv(dotenv_path='./.env')
api_path = os.environ.get('API_KEY_PATH', '')
if os.path.exists(api_path):
    google_api.init_credentials(api_path)

This code:

Loads environment variables from .env file
Gets the API key path
Initializes Google API credentials

Part 3: Complete Food Analysis App

Full Jupyter Notebook Code

# Import Required Libraries
import os, io
import json
import cv2
from google_api import ai_image_response as ai_image_response
import ipywidgets as widgets
from IPython.display import display, clear_output
from threading import Thread
from dotenv import load_dotenv
from langchain_google_vertexai import ChatVertexAI
from PIL import Image

def get_gemini_response(input_prompt, image):
    """Send image and prompt to Gemini for analysis"""
    model = ChatVertexAI(
        model_name='gemini-2.0-flash',
        convert_system_message_to_human=True,
    )
    response = ai_image_response(model, image=image, text=input_prompt)
    return response

# Create Widgets
input_prompt_widget = widgets.Textarea(
    value="""You are an expert nutritionist where you need to see the food items from the image
    and calculate the total calories, also provide the details of every food item with calories intake
    in the below format:

    1. Item 1 - no of calories
    2. Item 2 - no of calories
    ----
    ----
    Finally, you can mention whether the food is healthy or not and also mention the percentage split
    of the ratio of carbohydrates, fats, fibers, sugar, and other things required in our diet.""",
    placeholder='Type your input prompt here',
    description='Prompt:',
    layout={'width': '600px', 'height': '200px'}
)

analysis_button = widgets.Button(description='Analyze')
stop_camera_button = widgets.Button(description='Stop Camera')
clear_button = widgets.Button(description='Clear')

output_label = widgets.Textarea(
    value='',
    placeholder='The analysis result will be displayed here...',
    description='Result:',
    layout={'width': '600px', 'height': '200px'},
    disabled=True  # Make the output label read-only
)

camera_view = widgets.Image()

# Initialize camera
cap = cv2.VideoCapture(0)  # Use 0 for the default camera

def update_camera_view():
    """Continuously update camera feed in widget"""
    while True:
        ret, frame = cap.read()
        if ret:
            _, buffer = cv2.imencode('.jpg', frame)
            camera_view.value = buffer.tobytes()

def analyze_image(b):
    """Capture image and send to Gemini for analysis"""
    input_prompt = input_prompt_widget.value
    ret, frame = cap.read()
    if not ret:
        output_label.value = "Failed to capture image"
        return
    
    # Convert the captured frame to RGB
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    pil_image = Image.fromarray(frame_rgb)
    
    print("Sending to Gemini for analysis...")
    output_label.value = 'Analyzing... Please wait.'
    
    response = get_gemini_response(input_prompt, pil_image)
    print(response)
    
    # Display response in output label
    output_label.value = f"Food Details:\n{response}"

def clear_analyze(b):
    """Clear the output"""
    output_label.value = ""

# Attach Button Handlers
analysis_button.on_click(analyze_image)
clear_button.on_click(clear_analyze)

# Display Layout
display(widgets.HBox([camera_view, input_prompt_widget]))
display(widgets.HBox([analysis_button, stop_camera_button, clear_button]))
display(output_label)

# Start the camera feed in a separate thread
camera_thread = Thread(target=update_camera_view)
camera_thread.daemon = True
camera_thread.start()

Part 4: How It Works

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Jupyter Notebook                             │
│                                                                 │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐  │
│  │   Camera     │───►│  OpenCV      │───►│  Camera Widget   │  │
│  │   Thread     │    │  Capture     │    │  (Live View)     │  │
│  └──────────────┘    └──────────────┘    └──────────────────┘  │
│                                                                 │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐  │
│  │  Analyze     │───►│  Gemini API  │───►│  Output Widget   │  │
│  │  Button      │    │  (Vision)    │    │  (Results)       │  │
│  └──────────────┘    └──────────────┘    └──────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Key Components

Component	Purpose
`camera_view`	Displays live camera feed
`input_prompt_widget`	Customizable prompt for Gemini
`analysis_button`	Triggers image capture and analysis
`output_label`	Shows Gemini’s response
`camera_thread`	Background thread for camera updates

Part 5: Running the App

Step 1: Start Jupyter Notebook

cd ~/mangdang/gemini
jupyter notebook --ip=0.0.0.0 --no-browser

Step 2: Open the Notebook

Navigate to http://<minipupper-ip>:8888 and open food.ipynb

Step 3: Run the Cells

Run the first cell to install dependencies (only needed once)
Run the second cell to load credentials
Run the third cell to start the app

Step 4: Analyze Food

Point the camera at food
Click “Analyze” button
Wait for Gemini’s response
View nutritional analysis in the output

Part 6: Customizing the Prompt

You can modify the prompt for different analysis types:

Calorie Counter

prompt = """Analyze this food image and provide:
List of food items
Estimated calories for each item
Total calories
Recommended portion size"""

Ingredient Identifier

prompt = """Identify all ingredients visible in this food image.
List them with estimated quantities."""

Diet Compatibility

prompt = """Analyze this food for dietary restrictions:
- Is it vegetarian/vegan?
- Is it gluten-free?
- Is it dairy-free?
- Allergen warnings"""

Recipe Suggestion

prompt = """Based on the ingredients visible in this image,
suggest a healthy recipe that could be made."""

Part 7: Alternative Implementation (Simple Version)

If you don’t have the google_api module, use this simpler version:

import google.generativeai as genai
import cv2
from PIL import Image
import ipywidgets as widgets
from IPython.display import display
from threading import Thread
import os

# Configure Gemini
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
model = genai.GenerativeModel('gemini-2.0-flash')

# Widgets
camera_view = widgets.Image(format='jpeg', width=320, height=240)
output = widgets.Textarea(layout={'width': '600px', 'height': '200px'}, disabled=True)
analyze_btn = widgets.Button(description='Analyze Food')

cap = cv2.VideoCapture(0)
running = True

def update_camera():
    global running
    while running:
        ret, frame = cap.read()
        if ret:
            _, buffer = cv2.imencode('.jpg', frame)
            camera_view.value = buffer.tobytes()

def analyze(b):
    ret, frame = cap.read()
    if ret:
        output.value = "Analyzing..."
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        pil_image = Image.fromarray(frame_rgb)
        
        prompt = """Analyze this food image and provide:
        1. Food items identified
        2. Estimated calories per item
        3. Total calories
        4. Health assessment"""
        
        response = model.generate_content([prompt, pil_image])
        output.value = response.text

analyze_btn.on_click(analyze)
display(camera_view, analyze_btn, output)

Thread(target=update_camera, daemon=True).start()

Exercises

Exercise 1: Add Stop Camera Button

Implement the stop camera functionality to properly release the camera resource.

Exercise 2: Save Analysis History

Store each analysis result with timestamp and image for later review.

Exercise 3: Voice Output

Use text-to-speech to read the analysis results aloud.

Exercise 4: Multi-Language Support

Modify the prompt to get responses in different languages.

Troubleshooting

Issue	Solution
Camera not found	Check `ls /dev/video*`, try different index
API error	Verify credentials in `.env` file
Slow response	Reduce image size before sending
Widget not displaying	Ensure ipywidgets is installed

Summary

In this lab, you learned:

How to build an AI-powered food analysis app
How to integrate camera capture with Gemini Vision
How to use ipywidgets for interactive Jupyter apps
How to customize prompts for different analysis types
How to use threading for live camera feeds