Verifiers integration¶
Integration with the verifiers package for RL training with verifiable rewards.
See the API reference for signatures.
Overview¶
The gaze.verifiers module provides:
BaseMultiTurnEnv-- base class for multi-turn RL environments (extendsvf.MultiTurnEnv)VerifiableProcessorMixin-- mixin that addsas_verifiers_env()to processorsGazeAdapter-- bridges processor and verifiers message formats- Reward functions:
ExactMatchReward,TokenF1Reward,IoUReward,CombinedReward
Installation¶
verifiers is not part of the core gaze-vlm runtime dependencies. It is
declared in the dev dependency group and in several optional extras
([verifiers], [medmarks], [gemex], [agentclinic]), so it is
only pulled in when you ask for it.
End users install the optional extra:
Contributors working from a checkout get it through the dev group, which
uv sync installs by default:
RL training additionally needs torch/transformers/datasets, provided by the
rl group:
Quick start¶
1. Multi-turn environment¶
from gaze.verifiers import BaseMultiTurnEnv
class MyEnvironment(BaseMultiTurnEnv):
def get_system_prompt(self) -> str:
return "You are a helpful assistant."
def _build_user_message(self, case):
return f"Question: {case['question']}"
env = MyEnvironment(
dataset_path="my_data.jsonl",
max_turns=5,
)
2. Reward functions¶
from gaze.verifiers import ExactMatchReward, TokenF1Reward, CombinedReward
# Single reward
reward = ExactMatchReward(normalize=True)
score = reward(prompt, completion, {"answer": "4"})
# Combined rewards
combined = CombinedReward(
rewards=[ExactMatchReward(), TokenF1Reward()],
weights=[0.6, 0.4],
names=["exact", "f1"],
)
3. Processor-based environment¶
Use VerifiableProcessorMixin to turn a processor into a verifiers environment:
from gaze import AgenticProcessorBase
from gaze.verifiers import VerifiableProcessorMixin, ExactMatchReward
class MyProcessor(VerifiableProcessorMixin, AgenticProcessorBase):
def get_system_prompt(self, images, metadata):
return "You are a helpful assistant."
def get_user_message(self, images, metadata):
return metadata.get("question", "")
def get_response_schema(self):
return None
def validate_response(self, response):
return "continue" in response
def get_reward_function(self):
return ExactMatchReward()
EnvClass = MyProcessor.as_verifiers_env(
max_turns=5,
dataset_path="my_data.jsonl",
)
env = EnvClass()
Components¶
BaseMultiTurnEnv¶
Extends vf.MultiTurnEnv. Provides dataset loading, turn tracking, and logging.
Constructor:
BaseMultiTurnEnv(
cases=None, # Pre-loaded cases (list of dicts)
dataset_path=None, # Path to JSONL file
max_turns=10,
name="BaseGazeEnv",
log_dir=None,
)
Methods to override:
- get_system_prompt() -> str
- _build_user_message(case) -> str | list
- build_initial_state(prompt, info) -> dict
- is_completed(messages, state, info) -> bool
- env_response(messages, state, info) -> tuple[Messages, State]
Reward functions¶
All inherit from BaseRewardFunction which defines __call__(prompt, completion, info) -> float.
ExactMatchReward -- string equality after normalization:
ExactMatchReward(
normalize=True, # lowercase + strip whitespace
case_sensitive=False,
strip_braces=True, # remove {}[]()
)
TokenF1Reward -- token-level F1 score:
TokenF1Reward(
normalize=True,
case_sensitive=False,
tokenize="simple", # "simple", "word", or "character"
)
IoUReward -- bounding box overlap:
CombinedReward -- weighted combination:
CombinedReward(
rewards=[ExactMatchReward(), TokenF1Reward()],
weights=[0.6, 0.4],
names=["exact", "f1"],
)
GazeAdapter¶
Bridges a GAZE processor with verifiers message formats:
from gaze.verifiers import GazeAdapter
adapter = GazeAdapter(processor=my_processor)
result = await adapter.process_verifiers_messages(messages, info)
EnvClass = adapter.create_environment_class(max_turns=5)
Custom reward functions¶
from gaze.verifiers import BaseRewardFunction
class MyReward(BaseRewardFunction):
def __call__(self, prompt, completion, info) -> float:
pred = self._extract_prediction(completion)
ref = info.get("answer", "")
return float(pred == ref)
Data format¶
Use JSONL with consistent fields:
{"question": "...", "answer": "...", "image": "..."}
{"question": "...", "answer": "...", "context": "..."}
Troubleshooting¶
- Import errors: install the verifiers extra (
pip install gaze-vlm[verifiers]), or runuv syncfrom a checkout - Memory issues: reduce batch size
- Debugging: pass
log_dir="./logs"to the environment constructor