Skip to content

GAZE

GAZE

A modular Python framework for building multi-turn agentic vision-language model (VLM) systems. Built for medical image analysis but applicable to any visual reasoning task.

Key features

  • Multi-turn agentic loop -- JSON-structured tool-calling with configurable turn limits, schema validation, and automatic error recovery
  • 25 built-in tools (23 visual + 2 search) -- visual manipulation (zoom, crop, contrast, threshold, flip, rotate, etc.) and literature/image retrieval (PubMed, Open-i)
  • Task processors -- abstract base class with dependency injection
  • Model adapters -- OpenAI, LM Studio, HuggingFace Transformers
  • Verifiers integration -- reward functions for RL training

GAZE runs against cloud APIs (OpenAI, OpenRouter) or local models (LM Studio).

Installation

pip install gaze-vlm

Next steps