Lab2: AI Food Analysis App with Gemini

Introduction

In this lab, you will build an AI-powered food analysis application using Google Gemini and the Mini Pupper’s camera. The app captures images of food and uses Gemini’s vision capabilities to analyze nutritional content, estimate calories, and provide health recommendations.

Prerequisites

  • Completed Lab1 (Gemini API Setup)
  • Camera connected to Mini Pupper
  • Jupyter Notebook installed

Part 1: Quick Start - Test Gemini Vision

Before building the full app, let’s test Gemini Vision with a simple notebook.

Step 1: Clone Mangdang Repository

git clone http://github.com/lbaitemple/mangdang
cd mangdang/gemini

Step 2: Open food.ipynb

  1. Open Jupyter Lab: http://<robot-ip>:8888
  2. Navigate to mangdang/gemini/
  3. Open food.ipynb

Step 3: Test Vision

  1. Place a food item in front of the camera
  2. Run the notebook cells
  3. Modify the prompt to test different analyses
Food analysis in Jupyter

Part 2: Setup for Custom App

Install Required Libraries

Create a requirements.txt file:

google-generativeai
python-dotenv
opencv-python
ipywidgets
Pillow
langchain-google-vertexai

Install the dependencies:

pip install -r requirements.txt

Configure Credentials

Create a .env file with your API credentials:

# Copy the sample env file
cp env.sample .env

# Edit the .env file
nano .env

Add your API key path:

API_KEY_PATH=/path/to/your/credentials.json

Part 2: Understanding the Code

Load Credentials

from dotenv import load_dotenv
import os
import google_api
from PIL import Image

load_dotenv(dotenv_path='./.env')
api_path = os.environ.get('API_KEY_PATH', '')
if os.path.exists(api_path):
    google_api.init_credentials(api_path)

This code:

  • Loads environment variables from .env file
  • Gets the API key path
  • Initializes Google API credentials

Part 3: Complete Food Analysis App

Full Jupyter Notebook Code

# Import Required Libraries
import os, io
import json
import cv2
from google_api import ai_image_response as ai_image_response
import ipywidgets as widgets
from IPython.display import display, clear_output
from threading import Thread
from dotenv import load_dotenv
from langchain_google_vertexai import ChatVertexAI
from PIL import Image

def get_gemini_response(input_prompt, image):
    """Send image and prompt to Gemini for analysis"""
    model = ChatVertexAI(
        model_name='gemini-2.0-flash',
        convert_system_message_to_human=True,
    )
    response = ai_image_response(model, image=image, text=input_prompt)
    return response

# Create Widgets
input_prompt_widget = widgets.Textarea(
    value="""You are an expert nutritionist where you need to see the food items from the image
    and calculate the total calories, also provide the details of every food item with calories intake
    in the below format:

    1. Item 1 - no of calories
    2. Item 2 - no of calories
    ----
    ----
    Finally, you can mention whether the food is healthy or not and also mention the percentage split
    of the ratio of carbohydrates, fats, fibers, sugar, and other things required in our diet.""",
    placeholder='Type your input prompt here',
    description='Prompt:',
    layout={'width': '600px', 'height': '200px'}
)

analysis_button = widgets.Button(description='Analyze')
stop_camera_button = widgets.Button(description='Stop Camera')
clear_button = widgets.Button(description='Clear')

output_label = widgets.Textarea(
    value='',
    placeholder='The analysis result will be displayed here...',
    description='Result:',
    layout={'width': '600px', 'height': '200px'},
    disabled=True  # Make the output label read-only
)

camera_view = widgets.Image()

# Initialize camera
cap = cv2.VideoCapture(0)  # Use 0 for the default camera

def update_camera_view():
    """Continuously update camera feed in widget"""
    while True:
        ret, frame = cap.read()
        if ret:
            _, buffer = cv2.imencode('.jpg', frame)
            camera_view.value = buffer.tobytes()

def analyze_image(b):
    """Capture image and send to Gemini for analysis"""
    input_prompt = input_prompt_widget.value
    ret, frame = cap.read()
    if not ret:
        output_label.value = "Failed to capture image"
        return
    
    # Convert the captured frame to RGB
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    pil_image = Image.fromarray(frame_rgb)
    
    print("Sending to Gemini for analysis...")
    output_label.value = 'Analyzing... Please wait.'
    
    response = get_gemini_response(input_prompt, pil_image)
    print(response)
    
    # Display response in output label
    output_label.value = f"Food Details:\n{response}"

def clear_analyze(b):
    """Clear the output"""
    output_label.value = ""

# Attach Button Handlers
analysis_button.on_click(analyze_image)
clear_button.on_click(clear_analyze)

# Display Layout
display(widgets.HBox([camera_view, input_prompt_widget]))
display(widgets.HBox([analysis_button, stop_camera_button, clear_button]))
display(output_label)

# Start the camera feed in a separate thread
camera_thread = Thread(target=update_camera_view)
camera_thread.daemon = True
camera_thread.start()

Part 4: How It Works

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Jupyter Notebook                             │
│                                                                 │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐  │
│  │   Camera     │───►│  OpenCV      │───►│  Camera Widget   │  │
│  │   Thread     │    │  Capture     │    │  (Live View)     │  │
│  └──────────────┘    └──────────────┘    └──────────────────┘  │
│                                                                 │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐  │
│  │  Analyze     │───►│  Gemini API  │───►│  Output Widget   │  │
│  │  Button      │    │  (Vision)    │    │  (Results)       │  │
│  └──────────────┘    └──────────────┘    └──────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Key Components

Component Purpose
camera_view Displays live camera feed
input_prompt_widget Customizable prompt for Gemini
analysis_button Triggers image capture and analysis
output_label Shows Gemini’s response
camera_thread Background thread for camera updates

Part 5: Running the App

Step 1: Start Jupyter Notebook

cd ~/mangdang/gemini
jupyter notebook --ip=0.0.0.0 --no-browser

Step 2: Open the Notebook

Navigate to http://<minipupper-ip>:8888 and open food.ipynb

Step 3: Run the Cells

  1. Run the first cell to install dependencies (only needed once)
  2. Run the second cell to load credentials
  3. Run the third cell to start the app

Step 4: Analyze Food

  1. Point the camera at food
  2. Click “Analyze” button
  3. Wait for Gemini’s response
  4. View nutritional analysis in the output
Food analysis app interface

Part 6: Customizing the Prompt

You can modify the prompt for different analysis types:

Calorie Counter

prompt = """Analyze this food image and provide:
1. List of food items
2. Estimated calories for each item
3. Total calories
4. Recommended portion size"""

Ingredient Identifier

prompt = """Identify all ingredients visible in this food image.
List them with estimated quantities."""

Diet Compatibility

prompt = """Analyze this food for dietary restrictions:
- Is it vegetarian/vegan?
- Is it gluten-free?
- Is it dairy-free?
- Allergen warnings"""

Recipe Suggestion

prompt = """Based on the ingredients visible in this image,
suggest a healthy recipe that could be made."""

Part 7: Alternative Implementation (Simple Version)

If you don’t have the google_api module, use this simpler version:

import google.generativeai as genai
import cv2
from PIL import Image
import ipywidgets as widgets
from IPython.display import display
from threading import Thread
import os

# Configure Gemini
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
model = genai.GenerativeModel('gemini-2.0-flash')

# Widgets
camera_view = widgets.Image(format='jpeg', width=320, height=240)
output = widgets.Textarea(layout={'width': '600px', 'height': '200px'}, disabled=True)
analyze_btn = widgets.Button(description='Analyze Food')

cap = cv2.VideoCapture(0)
running = True

def update_camera():
    global running
    while running:
        ret, frame = cap.read()
        if ret:
            _, buffer = cv2.imencode('.jpg', frame)
            camera_view.value = buffer.tobytes()

def analyze(b):
    ret, frame = cap.read()
    if ret:
        output.value = "Analyzing..."
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        pil_image = Image.fromarray(frame_rgb)
        
        prompt = """Analyze this food image and provide:
        1. Food items identified
        2. Estimated calories per item
        3. Total calories
        4. Health assessment"""
        
        response = model.generate_content([prompt, pil_image])
        output.value = response.text

analyze_btn.on_click(analyze)
display(camera_view, analyze_btn, output)

Thread(target=update_camera, daemon=True).start()

Exercises

Exercise 1: Add Stop Camera Button

Implement the stop camera functionality to properly release the camera resource.

Exercise 2: Save Analysis History

Store each analysis result with timestamp and image for later review.

Exercise 3: Voice Output

Use text-to-speech to read the analysis results aloud.

Exercise 4: Multi-Language Support

Modify the prompt to get responses in different languages.


Troubleshooting

Issue Solution
Camera not found Check ls /dev/video*, try different index
API error Verify credentials in .env file
Slow response Reduce image size before sending
Widget not displaying Ensure ipywidgets is installed

Summary

In this lab, you learned:

  • How to build an AI-powered food analysis app
  • How to integrate camera capture with Gemini Vision
  • How to use ipywidgets for interactive Jupyter apps
  • How to customize prompts for different analysis types
  • How to use threading for live camera feeds

Reference