Evaluating the Gemini 2.5 Flash Model for Use in Dietary Monitoring

Anna-Margret Tamlin

doi:10.15353/jcvis.v11i1.10007

Vol. 11 No. 1 (2025)
Special Issue: Proceedings of CVIS 2025

Articles

Evaluating the Gemini 2.5 Flash Model for Use in Dietary Monitoring

https://doi.org/10.15353/jcvis.v11i1.10007

Published 2026-04-01

Anna-Margret Tamlin

How to Cite

Tamlin, A.-M. (2026). Evaluating the Gemini 2.5 Flash Model for Use in Dietary Monitoring. Journal of Computational Vision and Imaging Systems, 11(1), 38–42. https://doi.org/10.15353/jcvis.v11i1.10007

Download Citation

Abstract

Dietary monitoring is a complex yet highly impactful challenge within food computing, given its potential to transform the personalized management of metabolic and general health. Traditional 2D image-based assessment methods capture only static visual cues, offering limited information on eating behaviour. In contrast, video-based analysis provides greater temporal information, enabling the study of what is eaten, how it is eaten, and in what quantity. Prior research in this area introduced a baseline framework that used Vision-Language Models (VLMs) to analyze eating videos on a frame-by-frame basis. While this approach established the feasibility of using VLMs to interpret eating behaviours, it also revealed key limitations in model accuracy and contextual understanding. Building on this foundation, the present study evaluates the Gemini 2.5 Flash model relative to the previous framework across diverse eating scenarios and examines its performance with more specific prompts. These findings offer high-level insights into the potential of modern multimodal VLMs for dietary monitoring and pave the way for more accurate and practical approaches to video-based nutrient assessment.

PDF