LM Studio adapter¶
LMStudioAdapter targets a local LM Studio server. It subclasses the OpenAI
adapter but permits http:// URLs, drops response_format, and detects context
overflow for clearer errors. For how to supply it via the adapter_factory
pattern, see Getting started.
lmstudio_adapter ¶
LM Studio adapter for local inference via OpenAI-compatible API.
LMStudioAdapter ¶
Bases: OpenAIAdapter
Adapter for LM Studio's OpenAI-compatible local inference server.
Subclasses :class:OpenAIAdapter with these differences:
- HTTP base URLs are allowed (no HTTPS requirement).
- No real API key is required (LM Studio doesn't authenticate by default).
- Longer default timeout (300s) for local inference on consumer hardware.
- Tool messages use text-only content (no multipart image payloads).
-
response_formatis stripped — many local models (especially those with built-in thinking/reasoning) mishandle thejson_schemaresponse format, putting output intoreasoning_contentinstead ofcontent. The prompts already instruct JSON output. -
No retries on completion — local timeouts usually indicate OOM or model overload, not transient network issues.
Source code in src/gaze/models/lmstudio_adapter.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
client
property
¶
Create AsyncOpenAI client configured for LM Studio.
No HTTPS validation, no cloud API key requirement.
generate_chat
async
¶
generate_chat(
messages: list[dict[str, Any]],
max_tokens: int,
temperature: float,
tools: list[dict[str, Any]] | None = None,
response_format: dict[str, Any] | None = None,
stream: bool = False,
seed: int | None = None,
) -> (
tuple[str, list[dict[str, Any]] | None, GenerationLog]
| AsyncIterator[str]
)
Generate chat completion, stripping response_format.
Local models with built-in thinking (Qwen3.5, etc.) misroute
structured output into reasoning_content when
response_format is set, leaving content empty. Dropping
the parameter lets the prompt handle JSON formatting instead.
Source code in src/gaze/models/lmstudio_adapter.py
list_models
async
¶
List models currently loaded in LM Studio.
Convenience method for verifying the connection and seeing which models are available before starting inference.
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of model info dicts with at least an |
Source code in src/gaze/models/lmstudio_adapter.py
list_lmstudio_model_ids
async
¶
list_lmstudio_model_ids(
base_url: str | None = None,
*,
api_key: str | None = None,
timeout: float = 10.0,
) -> list[str]
Return model IDs from an OpenAI-compatible LM Studio endpoint.
Source code in src/gaze/models/lmstudio_adapter.py
require_lmstudio_model
async
¶
require_lmstudio_model(
model_name: str,
base_url: str | None = None,
*,
timeout: float = 10.0,
health_check: bool = True,
) -> list[str]
Fail fast when the requested model is not available in LM Studio.
When health_check is True (default), a 1-token completion is attempted
after verifying the model ID is listed. LM Studio lists all available
models but only loads them on demand — this catches OOM failures that
/v1/models alone cannot detect.