$ npm install -g compressx

Compress LLMs.
Keep the originals.

One command to shrink every model in your Ollama library. Originals stay intact — compressed versions get a -cx suffix.

$ npm install -g compressx

Universal install via npm. Works on any OS with Node.js.

Requires Node.js 18+

NEW · v0.6.1Hardening release
Live progress bar

Real-time per-tensor progress with percent and ETA while quantization runs. No more wondering if it hung.

Self-installing

First run auto-downloads llama.cpp binaries. No manual setup, no brew install prereqs.

Post-compression smoke test

Every compressed model gets a sanity check. Catches broken quants before you ever load them.

46 curated models + fallbacks

Works with unknown Ollama tags too — if it has a :Xb suffix, CompressX can compress it.

~/my-project
$ compressx
CompressX - LLM Compression for Ollama
✓ Ollama running with 20 models
✓ NVIDIA RTX 5060 | 32 GB RAM | 8 GB VRAM
Found 4 models that could be smaller:
Model Current → CompressX Savings
──────────────────────────────────────────────────
qwen3:14b 8.4 GB 6.2 GB Q3_K_M -26%
gemma4:12b 9.6 GB 5.8 GB Q3_K_M -40%
llama3.1:8b 4.9 GB 3.1 GB Q4_K_M -37%
? Select models to compress: (space to toggle)
❯ ◉ qwen3:14b
◉ gemma4:12b
◯ llama3.1:8b
[1/1] Re-quantizing local blob to Q3_K_M...
████████████████░░░░░░░░░░░░ 58.2% 169/291 tensors 0:14 elapsed eta 0:10
Using local Ollama blobs. ~30 sec each, zero download.

Works with your runtime

Compress once, deploy anywhere. Choose your target with the --target flag.

Ollama

DEFAULT

Auto-registers as model:tag-cx. No extra steps.

compressx compress qwen3:4b

LM Studio

Drops the GGUF into ~/.lmstudio/models/ so it appears in My Models.

compressx compress qwen3:4b --target lmstudio

Everything else

Leaves the raw GGUF file in the output directory. Use with any GGUF-compatible tool.

compressx compress qwen3:4b --target gguf

Compatible with: Ollama · LM Studio · llama.cpp · Jan · GPT4All · Msty · text-generation-webui · koboldcpp

1.

Scan

Run compressx. It connects to your local Ollama and auto-detects your GPU/RAM to find models that could be smaller.

2.

Compress

CompressX re-quantizes the GGUF file already in your Ollama library — ~30 seconds, zero download. No model yet? It falls back to fetching the original weights automatically. Use --from-source for pristine quality.

3.

Deploy

Auto-registers in Ollama (default), LM Studio, or leaves a raw GGUF file for llama.cpp, Jan, GPT4All, and friends. Pick with --target. Originals are never touched.

Why CompressX?

Originals stay intact

We never modify your existing models. Compressed versions live alongside them with a clear -cx suffix.

Fully local

Uses your own GPU. No upload, no cloud processing, no data leaving your machine. Privacy by design.

Hardware-aware

Auto-detects your VRAM and picks the right quantization level. No guessing, no OOM errors.

Free forever

The CLI is open source and free. No account required. No credits. No rate limits on local compression.

Commands

$ compressx

Scan Ollama library and interactively compress models

$ compressx --all

Show every installed model, even ones that already fit your hardware

$ compressx --preview

Library-wide preview: what compression would save for every installed model (read-only)

$ compressx preview qwen3:14b

See every quant level side-by-side for a specific model

$ compressx compress qwen3:4b

Compress a specific model to the auto-recommended quant level

$ compressx compress qwen3:4b -q q4_k_m

Compress with a specific quantization type

$ compressx compress qwen3:4b --from-source

Download original weights from HuggingFace for pristine quality (slower)

$ compressx compress qwen3:4b --target lmstudio

Deploy to LM Studio instead of Ollama

$ compressx compress qwen3:4b --target gguf

Just produce a GGUF file (for llama.cpp, Jan, GPT4All, Msty, etc.)

$ compressx hardware

Show detected GPU, VRAM, RAM, and recommended model sizes

$ compressx models

List all supported models

$ compressx update

Update CompressX to the latest version

$ compressx uninstall

Remove CompressX data directory (CLI removal is one more step)

Updates & Uninstall

CompressX checks for updates automatically once per day. You can also manage it manually.

Update

Get the latest version with new models, bug fixes, and features.

$ compressx update
$ npm install -g compressx@latest

Either command works — the first is a shortcut for the second.

× Uninstall

Fully remove CompressX, the CLI binary, and its data directory (~/.compressx/).

$ curl -fsSL https://compressx.asmith.media/uninstall.sh | sh
$ powershell -c "irm https://compressx.asmith.media/uninstall.ps1 | iex"

Top line: macOS/Linux. Bottom line: Windows.

Shrink your models in 30 seconds.

$ npm install -g compressx

Universal install via npm. Works on any OS with Node.js.

Requires Node.js 18+