Examples¶
GAZE includes five complete example applications demonstrating different use cases.
NOVA brain MRI¶
Location: examples/nova/ (README)
Brain MRI analysis with three sub-tasks: caption generation, diagnosis prediction, and lesion localization. Uses the NOVA dataset which auto-downloads from HuggingFace.
pip install gaze-vlm[nova]
# Single-turn mode
uv run python -m examples.nova.src.cli \
--model openai/gpt-4o \
--mode single_turn \
--max-samples 10
# Agentic mode with tools
uv run python -m examples.nova.src.cli \
--model openai/gpt-4o \
--mode agentic \
--use-tools \
--max-turns 5 \
--max-samples 10
GEMeX visual grounding¶
Location: examples/gemex_thinkvg/ (README)
Visual grounding with chain-of-thought reasoning on chest X-rays. Requires MIMIC-CXR access (PhysioNet credentialed).
pip install gaze-vlm[gemex]
uv run python -m examples.gemex_thinkvg.eval \
--dataset ./data/test.jsonl \
--image-dir /path/to/mimic-cxr-jpg \
--model openai/gpt-4o \
--mode agentic \
--use-tools \
--output ./results
AgentClinic NEJM¶
Location: examples/agentclinic_nejm/ (README)
Multi-turn diagnostic reasoning where the model gathers clinical information (history, exam, tests, imaging) before making a diagnosis.
pip install gaze-vlm[agentclinic]
uv run python -m examples.agentclinic_nejm.eval \
--dataset ./data/agentclinic_nejm_extended.jsonl \
--model openai/gpt-4o \
--num-samples 10 \
--output ./results
PubMedQA¶
Location: examples/pubmedqa/ (README)
Text-only medical Q&A with yes/no/maybe answers. Uses the PubMedQA dataset (auto-downloads).
pip install gaze-vlm[pubmedqa]
uv run python -m examples.pubmedqa.src.cli \
--model openai/gpt-4o \
--mode single_turn \
--max-samples 50
VQA-RAD¶
Location: examples/vqa_rad/ (README)
Radiology visual question answering with closed and open-ended questions. Uses the VQA-RAD dataset (auto-downloads).
pip install gaze-vlm[vqa-rad]
uv run python -m examples.vqa_rad.src.cli \
--model openai/gpt-4o \
--mode agentic \
--use-tools \
--max-samples 20
Local models¶
All examples support local model inference via LM Studio. Pass --base-url to point at your instance:
uv run python -m examples.pubmedqa.src.cli \
--model qwen3.5-a3b \
--base-url http://localhost:1234/v1 \
--mode single_turn \
--max-samples 5
Writing your own example¶
See Getting started for the pattern. The examples vary
in size, but the NOVA example (examples/nova/src/) is representative. Note
that evaluation/ is a package, not a single module, and dataset loading
lives under data/:
examples/your_task/
src/
__init__.py
cli.py # CLI entry point (argparse + run_evaluation)
config.py # Frozen config dataclass for the task
processor.py # AgenticProcessorBase subclass
schemas.py # Response schema(s) and validate_response()
data/ # Dataset loading package
__init__.py
evaluation/ # Metrics package, one module per sub-task
__init__.py
caption.py
detection.py
diagnosis.py
run_local.sh # LM Studio convenience script
README.md
Smaller examples (for instance pubmedqa) collapse some of these into fewer
modules. The only hard requirement is a processor that subclasses
AgenticProcessorBase and implements the four abstract methods.