technical

What Are Open-Weights AI Models?

Quick Answer

Open-weights AI models are models where the creator publicly releases the trained parameters (the weights), so anyone can download and run the model on their own hardware. This is different from closed models like GPT-4 or Claude, where you access the model only through an API and the weights never leave the vendor's servers. Popular open-weights models include Meta's Llama 3.1, Mistral 7B, and Falcon 40B.

Why the distinction between open-weights and closed models matters

When you call OpenAI's API, your data travels to OpenAI's servers, gets processed, and comes back. You have no control over where it sits during that process or what the vendor's retention policy actually does in practice. For many SMBs, that's a manageable risk. For healthcare, finance, or any business handling sensitive customer data, it's often a compliance problem.

Open-weights models exist on a spectrum. Some are released with permissive commercial licenses (Llama 3.1, Mistral). Others come with restrictions on commercial use or redistribution. The weights being public doesn't automatically mean you can do anything you want with them, so license review is a real step, not a formality.

How open-weights models actually work

When a lab trains a model, the result is a large file of floating-point numbers called weights. These numbers encode everything the model learned. An open-weights release means that file is publicly downloadable. You load it into an inference runtime (like vLLM, llama.cpp, or Ollama), run it on a GPU or CPU, and the model responds entirely within your own environment. No API call leaves your network.

The practical implications are significant. First, your data never touches the original vendor's infrastructure after download. Second, you can fine-tune the model on your own proprietary data without sharing that data with anyone. Third, you control the version. If Meta releases Llama 3.2 and it behaves differently, you stay on 3.1 until you choose to migrate. Closed API providers can update models underneath you without notice.

The tradeoff is real: you own the infrastructure burden. Running a 70-billion-parameter model well requires meaningful GPU resources, ongoing maintenance, and someone who knows how to keep the deployment secure and performant. For SMBs without an ML engineering team, that's exactly where a deployment partner comes in.

When open-weights isn't the right choice

If your use case doesn't involve sensitive data and your volume is low, a closed API is often simpler and cheaper to start with. The infrastructure overhead of a self-hosted open-weights deployment isn't worth it for a basic customer FAQ bot that touches no PII.

Also worth knowing: 'open-weights' is not the same as 'open source.' True open-source release means training code, training data, and architecture are all public. Most models called open-weights only release the final weights file. Llama 3.1, for example, is open-weights with a commercial license, not fully open source. If your legal or compliance team cares about supply-chain transparency, that distinction matters.

How we use open-weights models at Usmart

The majority of the private LLM deployments we build for SMBs run on open-weights models, typically Llama 3.1 or Mistral variants, hosted inside the client's own cloud environment or on-premise infrastructure. For HIPAA-regulated clients, this setup is what makes a Business Associate Agreement meaningful. We sign the BAA, but the architecture keeps PHI from ever leaving the client's environment in the first place.

We do use closed APIs when the task genuinely calls for it and the client's data classification allows it. But we don't default to a vendor API because it's easier to build on. The deployment decision starts with where your data needs to live, not with which model has the best benchmark score this week.

Ready to see it working for your business?

Book a free 30-minute strategy call. We will scope your use case and give you honest numbers on timeline, cost, and ROI.