GAZE¶
A modular Python framework for building multi-turn agentic vision-language model (VLM) systems. Built for medical image analysis but applicable to any visual reasoning task.
Key features¶
- Multi-turn agentic loop -- JSON-structured tool-calling with configurable turn limits, schema validation, and automatic error recovery
- 25 built-in tools (23 visual + 2 search) -- visual manipulation (zoom, crop, contrast, threshold, flip, rotate, etc.) and literature/image retrieval (PubMed, Open-i)
- Task processors -- abstract base class with dependency injection
- Model adapters -- OpenAI, LM Studio, HuggingFace Transformers
- Verifiers integration -- reward functions for RL training
GAZE runs against cloud APIs (OpenAI, OpenRouter) or local models (LM Studio).
Installation¶
Next steps¶
- Getting started -- build your first processor
- Architecture -- understand the design
- Examples -- five complete applications
- API reference -- full API documentation