One command to shrink every model in your Ollama library. Originals stay intact — compressed versions get a -cx suffix.
$ npm install -g compressxUniversal install via npm. Works on any OS with Node.js.
Requires Node.js 18+
Real-time per-tensor progress with percent and ETA while quantization runs. No more wondering if it hung.
First run auto-downloads llama.cpp binaries. No manual setup, no brew install prereqs.
Every compressed model gets a sanity check. Catches broken quants before you ever load them.
Works with unknown Ollama tags too — if it has a :Xb suffix, CompressX can compress it.
Compress once, deploy anywhere. Choose your target with the --target flag.
Auto-registers as model:tag-cx. No extra steps.
compressx compress qwen3:4bDrops the GGUF into ~/.lmstudio/models/ so it appears in My Models.
compressx compress qwen3:4b --target lmstudioLeaves the raw GGUF file in the output directory. Use with any GGUF-compatible tool.
compressx compress qwen3:4b --target ggufCompatible with: Ollama · LM Studio · llama.cpp · Jan · GPT4All · Msty · text-generation-webui · koboldcpp
Run compressx. It connects to your local Ollama and auto-detects your GPU/RAM to find models that could be smaller.
CompressX re-quantizes the GGUF file already in your Ollama library — ~30 seconds, zero download. No model yet? It falls back to fetching the original weights automatically. Use --from-source for pristine quality.
Auto-registers in Ollama (default), LM Studio, or leaves a raw GGUF file for llama.cpp, Jan, GPT4All, and friends. Pick with --target. Originals are never touched.
We never modify your existing models. Compressed versions live alongside them with a clear -cx suffix.
Uses your own GPU. No upload, no cloud processing, no data leaving your machine. Privacy by design.
Auto-detects your VRAM and picks the right quantization level. No guessing, no OOM errors.
The CLI is open source and free. No account required. No credits. No rate limits on local compression.
$ compressxScan Ollama library and interactively compress models
$ compressx --allShow every installed model, even ones that already fit your hardware
$ compressx --previewLibrary-wide preview: what compression would save for every installed model (read-only)
$ compressx preview qwen3:14bSee every quant level side-by-side for a specific model
$ compressx compress qwen3:4bCompress a specific model to the auto-recommended quant level
$ compressx compress qwen3:4b -q q4_k_mCompress with a specific quantization type
$ compressx compress qwen3:4b --from-sourceDownload original weights from HuggingFace for pristine quality (slower)
$ compressx compress qwen3:4b --target lmstudioDeploy to LM Studio instead of Ollama
$ compressx compress qwen3:4b --target ggufJust produce a GGUF file (for llama.cpp, Jan, GPT4All, Msty, etc.)
$ compressx hardwareShow detected GPU, VRAM, RAM, and recommended model sizes
$ compressx modelsList all supported models
$ compressx updateUpdate CompressX to the latest version
$ compressx uninstallRemove CompressX data directory (CLI removal is one more step)
CompressX checks for updates automatically once per day. You can also manage it manually.
Get the latest version with new models, bug fixes, and features.
$ compressx update$ npm install -g compressx@latestEither command works — the first is a shortcut for the second.
Fully remove CompressX, the CLI binary, and its data directory (~/.compressx/).
$ curl -fsSL https://compressx.asmith.media/uninstall.sh | sh$ powershell -c "irm https://compressx.asmith.media/uninstall.ps1 | iex"Top line: macOS/Linux. Bottom line: Windows.
$ npm install -g compressxUniversal install via npm. Works on any OS with Node.js.
Requires Node.js 18+