live · on your machine · no API key v3.5.x

Skip the hosted bill — run LM Studio as your local OpenAI server

LM Studio is the desktop application that turns a laptop into a local OpenAI-compatible API endpoint. Built for Mac and Windows developers who would rather not send every prompt of every prototype to a hosted service, the lm studio app exposes a /v1/chat/completions surface that drop-in replaces the official SDK's base URL. Developers can prototype, debug and stress-test against open-weight models like Llama, Qwen, DeepSeek and gpt-oss — without an API key, without a credit-card minimum, without rate limits.

// playground

Try LM Studio with a local LLM, right here

Pick a model on the left, type a prompt or click a chip, and watch a simulated streaming response with live tokens-per-second readout. This is a deterministic mock — the real thing runs on your hardware after install.

Models · GGUF

Pick a prompt below or type your own. The mock streams tokens at the picked model's typical local throughput.

▸ no network call · the playground responds deterministically from a small set of curated answers

your hardware

Will LM Studio run on my Mac or Windows PC?

Drag the RAM slider, pick your GPU class, and see which open-weight models will fit. Numbers assume Q4_K_M quantization — the default LM Studio recommends for most consumer hardware.

8162432486496128 GB

Apple Silicon (M1/M2/M3/M4) counts as "dedicated" — unified memory feeds the GPU directly.

model catalogue

LM Studio model library — open weights by family

LM Studio's model browser pulls open weights directly from Hugging Face. Below is a snapshot of the families most commonly downloaded in 2026 — click a chip to filter.

Llama-3.1-8B8 B
Meta · Llama 3.1

General-purpose instruct model. The default starting point for most LM Studio users — strong reasoning, fast on 16 GB hardware.

4.6 GB Q4_K_M8K ctx
Llama-3.1-70B70 B
Meta · Llama 3.1

Heavy-class generalist. Comfortable on a 64 GB Apple Silicon Mac or a 48 GB+ Nvidia rig; reasoning approaches GPT-4 class.

42 GB Q4_K_M128K ctx
Qwen3-14B14 B
Alibaba · Qwen3

Strong multilingual model with native tool-calling support. The current sweet spot for 32 GB machines.

9.8 GB Q5_K_M32K ctx
DeepSeek-R1-7B7 B
DeepSeek · R1 distill

Reasoning-tuned distill that shows its work. Excellent for code review and step-by-step math, on a 16 GB machine.

4.1 GB Q4_016K ctx
Mistral-Nemo-12B12 B
Mistral · Nemo

Apache-licensed mid-size workhorse with a 128K context window. Popular for retrieval-augmented workflows.

7.5 GB Q4_K_M128K ctx
gpt-oss-20B20 B
OpenAI · open release

OpenAI's 2025 open-weight release. Slower per token than smaller models but the highest local quality short of 70B-class.

12.4 GB Q4_K_M32K ctx
Gemma-3-9B9 B
Google · Gemma 3

Google's open-weight family, tuned for instruction following with light hardware footprint.

5.4 GB Q4_K_M8K ctx
Phi-4-14B14 B
Microsoft · Phi-4

Reasoning-focused 14B from Microsoft. Punches well above its weight on code and math benchmarks for the size.

8.7 GB Q4_K_M16K ctx
SDXL 1.03.5 B
Stability · diffusion

Stable Diffusion XL — the open-weight image generator most commonly run inside LM Studio's diffusion mode.

6.9 GB safetensors1024² px
FLUX.1-schnell12 B
Black Forest Labs · FLUX

Apache-licensed diffusion model. State-of-the-art quality at home on 24 GB+ unified or dedicated VRAM.

23 GB safetensors2048² px
tokens / second

LM Studio tokens-per-second across Mac and Windows

Illustrative throughput numbers measured at Q4_K_M quantization. Pick a hardware preset and the bars race to the matching tokens-per-second rate.

llama 3.1-8B-Q4
— tok/s
qwen 3-14B-Q5
— tok/s
deepseek r1-7B-Q4
— tok/s
gpt-oss 20B-Q4
— tok/s
phi 4-14B-Q4
— tok/s
platforms

LM Studio Mac vs LM Studio Windows — same UI, different back-ends

Same UI, same model catalogue, but the back-end stack changes per OS. The lm studio mac edition has the MLX runtime; the lmstudio windows edition has CUDA, ROCm and XPU.

lm studio mac

macOS 13.4 or later · Apple Silicon native

Notarized .dmg with the MLX runtime for measurably faster on-device inference on M-series chips. Integrates with Spotlight, Shortcuts and the menu bar.

  • Native Apple Silicon (M1 onwards)
  • MLX runtime for diffusion + LLMs
  • Unified memory used as VRAM
  • Shortcuts & CLI hooks
16 GB
min unified RAM
≥ M1
Apple Silicon
.dmg
notarized installer

lmstudio windows

Windows 10 / 11 · x64 + ARM64

Signed MSI installer with full GPU acceleration across NVIDIA CUDA, AMD ROCm and Intel Arc XPU back-ends. SmartScreen-clean, AppLocker-friendly.

  • NVIDIA CUDA back-end
  • AMD ROCm back-end
  • Intel Arc / XPU acceleration
  • Vulkan fallback for any GPU
16 GB
min system RAM
8 GB
recommended VRAM
.msi
signed installer
// marketplace

Extend the app with lm studio plugins

The plugin runtime turns LM Studio into a programmable surface. Browse community extensions, install with one click — every plugin runs in a sandboxed worker on your machine.

web-search
Search & RAG

Live web search before generation. Hooks into SearXNG or a self-hosted Brave Search endpoint.

↓ 42k · ★ 4.8
rag-folder
Search & RAG

Index a local folder of PDFs and Markdown, then chat against it with citations.

↓ 31k · ★ 4.7
code-runner
Code

Sandboxed Python and JavaScript execution for tool-calling workflows. Results stream back as messages.

↓ 28k · ★ 4.9
git-context
Code

Drops the current Git diff into the system prompt — great for code review with a local model.

↓ 14k · ★ 4.6
mcp-bridge
Tools

Connect any MCP server (Model Context Protocol) to your local model as a tool-calling target.

↓ 18k · ★ 4.8
shell-call
Tools

Allow the model to run whitelisted shell commands. Useful for local agentic workflows.

↓ 9k · ★ 4.4
theme-mono
UI

Monochrome editor theme with monospace everywhere and reduced motion.

↓ 6k · ★ 4.5
cmd-palette
UI

⌘K-style global command palette across models, prompts and conversations.

↓ 12k · ★ 4.7
quantization

LM Studio quantization explained — Q4, Q5, Q8 and what they cost you

Quantization shrinks a model's weights by storing them in fewer bits. The slider walks you through the common levels — watch the file size shrink and the quality meter drop in real time.

Q4_K_MQuantization level 5 of 7
FP16Q8Q6Q5Q4Q3Q2
File size (8 B model)— GB
Output quality (vs FP16)— %

▸ Q-suffixes from llama.cpp · K_M variants pack the most quality per byte at most levels

From a developer's perspective the answer to what is lm studio is straightforward: it is a desktop application that ships an OpenAI-compatible inference server alongside a chat UI, so any script written against the OpenAI Python or TypeScript SDK can be pointed at it with a single base-URL change. LM Studio is the desktop application that turns a laptop into a local OpenAI-compatible API endpoint. Built for Mac and Windows developers who would rather not send every prompt of every prototype to a hosted service, the lm studio app exposes a /v1/chat/completions surface that drop-in replaces the official SDK's base URL. Developers can prototype, debug and stress-test against open-weight models like Llama, Qwen, DeepSeek and gpt-oss — without an API key, without a credit-card minimum, without rate limits.

LM Studio for the developer who wants a local API

From a developer's perspective the answer to what is lm studio is straightforward: it ships an OpenAI-compatible inference server alongside a chat UI, so any script written against the OpenAI SDK can be pointed at it with a single base-URL change. The longer answer to what is lm studio app, for the same audience, is a polished front-end on top of llama.cpp and MLX with a curated open-weight catalogue, a sandboxed plugin runtime and a CLI for headless deployments. The lm studio ai workflow for a developer collapses to: install, pull a model, start the server, set OPENAI_BASE_URL=http://localhost:1234/v1 — done.

Is LM Studio safe to leave running on a build server

Anyone shipping a developer tool to colleagues will eventually ask is lm studio safe to leave running on a build server, an internal staging box or a personal laptop overnight. Yes. Code-signed binaries on Mac and Windows, clean independent scans on VirusTotal, no inbound network connections by default. The lm studio safe story is what makes it deployable inside corporate environments: model weights pulled over HTTPS with verifiable hashes, license activation as the single outbound call, and the local API server bound to 127.0.0.1 unless explicitly opened. For a team running internal tooling against the application, the documented network boundary is the cleanest in the local AI category.

The minute I set OPENAI_BASE_URL to localhost and the rest of the codebase kept working, the LM Studio app stopped being a chat client and started being part of my development environment. Diego Ramírez — Developer Tools Editor

LM Studio Mac: MLX is the developer-friendly back-end

On macOS the lm studio mac edition is the developer's default. Distributed as a notarized .dmg, runs natively on Apple Silicon (M1 onwards), requires macOS 13.4 or later. The MLX runtime is Apple's own on-device ML framework, and on M-series chips it delivers higher tokens-per-second than the cross-platform llama.cpp back-end. For a developer iterating against the local API in a tight loop, an M2 Pro with 16 GB unified memory keeps Llama-3.1-8B responsive; the 13B and 14B reasoning models really want 32 GB so the OS does not start paging when the IDE, browser and chat UI compete for memory. The lmstudio mac install integrates cleanly with the menu bar and exposes the CLI as a homebrew-installable companion binary, so a developer can drive the server from a Makefile without ever opening the GUI.

LM Studio Windows: CUDA, ROCm, XPU for prototyping

On the Windows side the lm studio download windows installer is a standard MSI for Windows 10 and Windows 11, x64 and ARM64. The lmstudio windows edition supports NVIDIA CUDA, AMD ROCm and Intel Arc XPU back-ends, which lets a developer prototype against the same OpenAI-compatible API regardless of which GPU is in the box under the desk. For a developer building a tight agentic loop that hits the local model dozens of times per second, a Windows workstation with a current-generation NVIDIA card produces enough throughput on a 13B model to keep the loop responsive in a way a unified-memory laptop typically cannot, simply because dedicated VRAM does not contend with the rest of the operating system. The Windows installer is digitally signed.

Using the lm studio local llm workflow as a drop-in API

For a developer the lm studio local llm workflow is the same three steps as for an end user — pick model, download, load — but the next step is different: enable the local server in Settings → Server, point an OpenAI SDK at it, and start writing code against open weights. Behind the scenes the lm studio local llm app handles memory mapping, GPU layer offload and KV cache management on the developer's behalf. As a lm studio local llm desktop app it is the only consumer-grade tool in the category that exposes a stable, documented HTTP API surface — Ollama is a close second, with a slightly different request format.

Common open-weight families used in prototypes

  • Llama family — 3.1, 3.2 and 4 in 8B, 70B and 405B sizes.
  • Qwen, DeepSeek, Mistral, Gemma, gpt-oss and Phi — broad coverage of open-weight releases from major labs.
Local inference does not replace the cloud — but for any prototype that hits a model more than ten times in a debug loop, LM Studio is the version where the bill stays zero and the rate limit never fires. Diego Ramírez — Developer Tools Editor

LM Studio plugins as a developer extension surface

The lm studio plugins runtime turns the application into a programmable surface — and for a developer audience that is the whole point. Plugins are written in TypeScript or Python, run in a sandboxed worker, and can intercept inference requests, attach tool-calling back-ends, expose new prompt processors and add UI surfaces. The plugin SDK is published under a permissive licence and the marketplace is curated by the publisher, which means a developer can ship an internal plugin to colleagues without going through any external review. Common community plugins cover web search, RAG over a local folder, code execution and MCP (Model Context Protocol) bridges to other tool servers.

LM Studio image generation through the same OpenAI-shaped API

The lm studio image generation feature reuses the model browser and the local server, which means a developer can hit a local /v1/images/generations endpoint with a request shaped exactly like the OpenAI one — and get back a Stable Diffusion, SDXL or FLUX-class image generated on the developer's own GPU. VRAM is the binding constraint on the diffusion side, and the practical thresholds match what the open-weight diffusion community uses for hosted-API benchmarking. On Apple Silicon the MLX runtime extends to diffusion models, and an M3 Max approaches mid-range NVIDIA throughput at SDXL. The same prompt-and-seed pair produces deterministic output across machines, which makes the feature genuinely useful in automated test pipelines.

The lm studio download for a CI/CD pipeline

The recommended lm studio download path for a development workstation or a CI runner is the publisher's official website. The Mac edition arrives as a notarized .dmg, the Windows edition as a signed MSI that supports unattended deployment via Intune, Munki, Jamf or any package manager that handles standard installer formats. A Linux AppImage is offered for headless build hosts. Every feature is available from first launch, including the API server, so a CI job can validate the full prompt-and-response loop on a fresh box.

Three-step developer setup

  1. Install LM Studio and pull a chat-completion-tuned model in the GUI or via lms get llama-3.1-8b-instruct.
  2. Enable the local API server in Settings → Server (it binds to 127.0.0.1 by default).
  3. Set OPENAI_BASE_URL=http://localhost:1234/v1 in the project's .env and point the OpenAI SDK at it.

Final word: lmstudio is the cleanest local API for developers

For a developer building AI features in 2026, lmstudio is the cleanest path from the prototype to the local-only deployment. The combination of lm studio mac and lm studio for windows builds, the OpenAI-compatible local API surface, the documented CLI, the sandboxed plugin runtime and the image generation endpoint all reachable from the same /v1 base URL covers virtually every prototype a developer might assemble against a hosted chat product. The trade-off remains hardware: 16 GB of memory is the practical floor, 32 GB or a dedicated GPU is where a tight agentic loop becomes pleasant to write against.

help

LM Studio FAQ

What is the minimum hardware for a developer workstation?
For most prototyping work, 16 GB of memory plus a recent GPU is plenty — Llama-3.1-8B at Q4_K_M runs comfortably at 25+ tok/s. For a tighter agentic loop hitting the model every second, 32 GB and a dedicated 12 GB+ VRAM GPU is the sensible floor.
Does the local API server work in a Docker container?
Not directly — LM Studio itself runs as a desktop application. For a containerized deployment, use the lms CLI in headless mode on a host with GPU pass-through, or run the LM Studio server on the host and forward the port into the container.
Can I run LM Studio on a remote workstation and connect from a laptop?
Yes, by binding the local API server to an interface reachable from the laptop and using SSH or a VPN to reach it. The application's default 127.0.0.1 binding is the safe default; loosen it intentionally.
Does the local API server log requests?
Yes — requests and responses can be logged to the application's local log file, useful for debugging an agentic loop. Logging is disabled by default and the log file never leaves the machine.
Are conversation transcripts shared between projects?
No. Each conversation lives in the local SQLite database scoped to the application's user-data directory. Projects are isolated unless you explicitly export and import.
What outbound calls does the app make during normal use?
Two: license activation (cacheable for offline use) and model downloads from Hugging Face. Inference itself is entirely local — no prompt or response ever leaves the machine.
Does the local API support function calling and tool use?
Yes, on models that natively support it (Qwen, Llama 3.1+, gpt-oss). The request and response shape matches OpenAI's tool-calling format exactly.
Can I host multiple models simultaneously behind the local server?
Yes — the server can hold several models in memory at once and route requests by the model field, the same way the hosted OpenAI API does. Memory budget is the practical limit.
Are embeddings supported?
Yes. The /v1/embeddings endpoint works against any GGUF embeddings model loaded in the app — useful for building a local RAG pipeline without leaving the machine.
Will the official OpenAI SDK work with LM Studio?
Yes — both the Python (openai) and TypeScript (openai) packages work with no code changes other than the base URL and a placeholder API key string.
Can a plugin call out to an external service?
Only if the user explicitly grants network permission in Settings. By default plugins run in a sandboxed worker with no outbound network access.
How do I write a custom plugin?
Scaffold a new project with lms plugin create — the SDK is published under a permissive licence. The runtime exposes hooks for prompt processing, tool calling and new UI surfaces, all inside a sandboxed worker.

Download LM Studio for Mac and Windows

Install LM Studio on your Mac or Windows PC and try a local model end to end — chat, image generation and the OpenAI-compatible server, all on-device.

→ STEP 01

Download

Pull the installer from the publisher's official site or the platform store.

→ STEP 02

Pick a model

The first-run wizard recommends a Q4_K_M model sized to your hardware.

→ STEP 03

Chat

Start a conversation in the built-in interface — no account, no API key.