General-purpose instruct model. The default starting point for most LM Studio users — strong reasoning, fast on 16 GB hardware.
Skip the hosted bill — run LM Studio as your local OpenAI server
LM Studio is the desktop application that turns a laptop into a local OpenAI-compatible API endpoint. Built for Mac and Windows developers who would rather not send every prompt of every prototype to a hosted service, the lm studio app exposes a /v1/chat/completions surface that drop-in replaces the official SDK's base URL. Developers can prototype, debug and stress-test against open-weight models like Llama, Qwen, DeepSeek and gpt-oss — without an API key, without a credit-card minimum, without rate limits.
Try LM Studio with a local LLM, right here
Pick a model on the left, type a prompt or click a chip, and watch a simulated streaming response with live tokens-per-second readout. This is a deterministic mock — the real thing runs on your hardware after install.
Pick a prompt below or type your own. The mock streams tokens at the picked model's typical local throughput.
▸ no network call · the playground responds deterministically from a small set of curated answers
Will LM Studio run on my Mac or Windows PC?
Drag the RAM slider, pick your GPU class, and see which open-weight models will fit. Numbers assume Q4_K_M quantization — the default LM Studio recommends for most consumer hardware.
Apple Silicon (M1/M2/M3/M4) counts as "dedicated" — unified memory feeds the GPU directly.
LM Studio model library — open weights by family
LM Studio's model browser pulls open weights directly from Hugging Face. Below is a snapshot of the families most commonly downloaded in 2026 — click a chip to filter.
Heavy-class generalist. Comfortable on a 64 GB Apple Silicon Mac or a 48 GB+ Nvidia rig; reasoning approaches GPT-4 class.
Strong multilingual model with native tool-calling support. The current sweet spot for 32 GB machines.
Reasoning-tuned distill that shows its work. Excellent for code review and step-by-step math, on a 16 GB machine.
Apache-licensed mid-size workhorse with a 128K context window. Popular for retrieval-augmented workflows.
OpenAI's 2025 open-weight release. Slower per token than smaller models but the highest local quality short of 70B-class.
Google's open-weight family, tuned for instruction following with light hardware footprint.
Reasoning-focused 14B from Microsoft. Punches well above its weight on code and math benchmarks for the size.
Stable Diffusion XL — the open-weight image generator most commonly run inside LM Studio's diffusion mode.
Apache-licensed diffusion model. State-of-the-art quality at home on 24 GB+ unified or dedicated VRAM.
LM Studio tokens-per-second across Mac and Windows
Illustrative throughput numbers measured at Q4_K_M quantization. Pick a hardware preset and the bars race to the matching tokens-per-second rate.
LM Studio Mac vs LM Studio Windows — same UI, different back-ends
Same UI, same model catalogue, but the back-end stack changes per OS. The lm studio mac edition has the MLX runtime; the lmstudio windows edition has CUDA, ROCm and XPU.
lm studio mac
macOS 13.4 or later · Apple Silicon nativeNotarized .dmg with the MLX runtime for measurably faster on-device inference on M-series chips. Integrates with Spotlight, Shortcuts and the menu bar.
- Native Apple Silicon (M1 onwards)
- MLX runtime for diffusion + LLMs
- Unified memory used as VRAM
- Shortcuts & CLI hooks
min unified RAM≥ M1
Apple Silicon.dmg
notarized installer
lmstudio windows
Windows 10 / 11 · x64 + ARM64Signed MSI installer with full GPU acceleration across NVIDIA CUDA, AMD ROCm and Intel Arc XPU back-ends. SmartScreen-clean, AppLocker-friendly.
- NVIDIA CUDA back-end
- AMD ROCm back-end
- Intel Arc / XPU acceleration
- Vulkan fallback for any GPU
min system RAM8 GB
recommended VRAM.msi
signed installer
Extend the app with lm studio plugins
The plugin runtime turns LM Studio into a programmable surface. Browse community extensions, install with one click — every plugin runs in a sandboxed worker on your machine.
Live web search before generation. Hooks into SearXNG or a self-hosted Brave Search endpoint.
Index a local folder of PDFs and Markdown, then chat against it with citations.
Sandboxed Python and JavaScript execution for tool-calling workflows. Results stream back as messages.
Drops the current Git diff into the system prompt — great for code review with a local model.
Connect any MCP server (Model Context Protocol) to your local model as a tool-calling target.
Allow the model to run whitelisted shell commands. Useful for local agentic workflows.
Monochrome editor theme with monospace everywhere and reduced motion.
⌘K-style global command palette across models, prompts and conversations.
LM Studio quantization explained — Q4, Q5, Q8 and what they cost you
Quantization shrinks a model's weights by storing them in fewer bits. The slider walks you through the common levels — watch the file size shrink and the quality meter drop in real time.
▸ Q-suffixes from llama.cpp · K_M variants pack the most quality per byte at most levels
From a developer's perspective the answer to what is lm studio is straightforward: it is a desktop application that ships an OpenAI-compatible inference server alongside a chat UI, so any script written against the OpenAI Python or TypeScript SDK can be pointed at it with a single base-URL change. LM Studio is the desktop application that turns a laptop into a local OpenAI-compatible API endpoint. Built for Mac and Windows developers who would rather not send every prompt of every prototype to a hosted service, the lm studio app exposes a /v1/chat/completions surface that drop-in replaces the official SDK's base URL. Developers can prototype, debug and stress-test against open-weight models like Llama, Qwen, DeepSeek and gpt-oss — without an API key, without a credit-card minimum, without rate limits.
LM Studio for the developer who wants a local API
From a developer's perspective the answer to what is lm studio is straightforward: it ships an OpenAI-compatible inference server alongside a chat UI, so any script written against the OpenAI SDK can be pointed at it with a single base-URL change. The longer answer to what is lm studio app, for the same audience, is a polished front-end on top of llama.cpp and MLX with a curated open-weight catalogue, a sandboxed plugin runtime and a CLI for headless deployments. The lm studio ai workflow for a developer collapses to: install, pull a model, start the server, set OPENAI_BASE_URL=http://localhost:1234/v1 — done.
Is LM Studio safe to leave running on a build server
Anyone shipping a developer tool to colleagues will eventually ask is lm studio safe to leave running on a build server, an internal staging box or a personal laptop overnight. Yes. Code-signed binaries on Mac and Windows, clean independent scans on VirusTotal, no inbound network connections by default. The lm studio safe story is what makes it deployable inside corporate environments: model weights pulled over HTTPS with verifiable hashes, license activation as the single outbound call, and the local API server bound to 127.0.0.1 unless explicitly opened. For a team running internal tooling against the application, the documented network boundary is the cleanest in the local AI category.
The minute I set OPENAI_BASE_URL to localhost and the rest of the codebase kept working, the LM Studio app stopped being a chat client and started being part of my development environment.Diego Ramírez — Developer Tools Editor
LM Studio Mac: MLX is the developer-friendly back-end
On macOS the lm studio mac edition is the developer's default. Distributed as a notarized .dmg, runs natively on Apple Silicon (M1 onwards), requires macOS 13.4 or later. The MLX runtime is Apple's own on-device ML framework, and on M-series chips it delivers higher tokens-per-second than the cross-platform llama.cpp back-end. For a developer iterating against the local API in a tight loop, an M2 Pro with 16 GB unified memory keeps Llama-3.1-8B responsive; the 13B and 14B reasoning models really want 32 GB so the OS does not start paging when the IDE, browser and chat UI compete for memory. The lmstudio mac install integrates cleanly with the menu bar and exposes the CLI as a homebrew-installable companion binary, so a developer can drive the server from a Makefile without ever opening the GUI.
LM Studio Windows: CUDA, ROCm, XPU for prototyping
On the Windows side the lm studio download windows installer is a standard MSI for Windows 10 and Windows 11, x64 and ARM64. The lmstudio windows edition supports NVIDIA CUDA, AMD ROCm and Intel Arc XPU back-ends, which lets a developer prototype against the same OpenAI-compatible API regardless of which GPU is in the box under the desk. For a developer building a tight agentic loop that hits the local model dozens of times per second, a Windows workstation with a current-generation NVIDIA card produces enough throughput on a 13B model to keep the loop responsive in a way a unified-memory laptop typically cannot, simply because dedicated VRAM does not contend with the rest of the operating system. The Windows installer is digitally signed.
Using the lm studio local llm workflow as a drop-in API
For a developer the lm studio local llm workflow is the same three steps as for an end user — pick model, download, load — but the next step is different: enable the local server in Settings → Server, point an OpenAI SDK at it, and start writing code against open weights. Behind the scenes the lm studio local llm app handles memory mapping, GPU layer offload and KV cache management on the developer's behalf. As a lm studio local llm desktop app it is the only consumer-grade tool in the category that exposes a stable, documented HTTP API surface — Ollama is a close second, with a slightly different request format.
Common open-weight families used in prototypes
- Llama family — 3.1, 3.2 and 4 in 8B, 70B and 405B sizes.
- Qwen, DeepSeek, Mistral, Gemma, gpt-oss and Phi — broad coverage of open-weight releases from major labs.
Local inference does not replace the cloud — but for any prototype that hits a model more than ten times in a debug loop, LM Studio is the version where the bill stays zero and the rate limit never fires.Diego Ramírez — Developer Tools Editor
LM Studio plugins as a developer extension surface
The lm studio plugins runtime turns the application into a programmable surface — and for a developer audience that is the whole point. Plugins are written in TypeScript or Python, run in a sandboxed worker, and can intercept inference requests, attach tool-calling back-ends, expose new prompt processors and add UI surfaces. The plugin SDK is published under a permissive licence and the marketplace is curated by the publisher, which means a developer can ship an internal plugin to colleagues without going through any external review. Common community plugins cover web search, RAG over a local folder, code execution and MCP (Model Context Protocol) bridges to other tool servers.
LM Studio image generation through the same OpenAI-shaped API
The lm studio image generation feature reuses the model browser and the local server, which means a developer can hit a local /v1/images/generations endpoint with a request shaped exactly like the OpenAI one — and get back a Stable Diffusion, SDXL or FLUX-class image generated on the developer's own GPU. VRAM is the binding constraint on the diffusion side, and the practical thresholds match what the open-weight diffusion community uses for hosted-API benchmarking. On Apple Silicon the MLX runtime extends to diffusion models, and an M3 Max approaches mid-range NVIDIA throughput at SDXL. The same prompt-and-seed pair produces deterministic output across machines, which makes the feature genuinely useful in automated test pipelines.
The lm studio download for a CI/CD pipeline
The recommended lm studio download path for a development workstation or a CI runner is the publisher's official website. The Mac edition arrives as a notarized .dmg, the Windows edition as a signed MSI that supports unattended deployment via Intune, Munki, Jamf or any package manager that handles standard installer formats. A Linux AppImage is offered for headless build hosts. Every feature is available from first launch, including the API server, so a CI job can validate the full prompt-and-response loop on a fresh box.
Three-step developer setup
- Install LM Studio and pull a chat-completion-tuned model in the GUI or via
lms get llama-3.1-8b-instruct. - Enable the local API server in Settings → Server (it binds to 127.0.0.1 by default).
- Set
OPENAI_BASE_URL=http://localhost:1234/v1in the project's.envand point the OpenAI SDK at it.
Final word: lmstudio is the cleanest local API for developers
For a developer building AI features in 2026, lmstudio is the cleanest path from the prototype to the local-only deployment. The combination of lm studio mac and lm studio for windows builds, the OpenAI-compatible local API surface, the documented CLI, the sandboxed plugin runtime and the image generation endpoint all reachable from the same /v1 base URL covers virtually every prototype a developer might assemble against a hosted chat product. The trade-off remains hardware: 16 GB of memory is the practical floor, 32 GB or a dedicated GPU is where a tight agentic loop becomes pleasant to write against.
LM Studio FAQ
What is the minimum hardware for a developer workstation?
Does the local API server work in a Docker container?
lms CLI in headless mode on a host with GPU pass-through, or run the LM Studio server on the host and forward the port into the container.Can I run LM Studio on a remote workstation and connect from a laptop?
Does the local API server log requests?
Are conversation transcripts shared between projects?
What outbound calls does the app make during normal use?
Does the local API support function calling and tool use?
Can I host multiple models simultaneously behind the local server?
model field, the same way the hosted OpenAI API does. Memory budget is the practical limit.Are embeddings supported?
/v1/embeddings endpoint works against any GGUF embeddings model loaded in the app — useful for building a local RAG pipeline without leaving the machine.Will the official OpenAI SDK work with LM Studio?
openai) and TypeScript (openai) packages work with no code changes other than the base URL and a placeholder API key string.Can a plugin call out to an external service?
How do I write a custom plugin?
lms plugin create — the SDK is published under a permissive licence. The runtime exposes hooks for prompt processing, tool calling and new UI surfaces, all inside a sandboxed worker.Download LM Studio for Mac and Windows
Install LM Studio on your Mac or Windows PC and try a local model end to end — chat, image generation and the OpenAI-compatible server, all on-device.
Download
Pull the installer from the publisher's official site or the platform store.
Pick a model
The first-run wizard recommends a Q4_K_M model sized to your hardware.
Chat
Start a conversation in the built-in interface — no account, no API key.