gpt-oss-20b-GGUF
Read our guide on using
gpt-ossto learn how to adjust its responses
  
Highlights
- Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
- Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120brun on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and thegpt-oss-20bmodel run within 16GB of memory.
Refer to the original model card for more details on the model
Quants
| Link | URI | Size | 
|---|---|---|
| GGUF | hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf | 12.1GB | 
| GGUF | hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.F16.gguf | 13.8GB | 
Download a quant using
node-llama-cpp(more info):npx -y node-llama-cpp pull <URI>
Usage
	
		
	
	
		Use with node-llama-cpp (recommended)
	
CLI
Chat with the model:
npx -y node-llama-cpp chat hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf
Ensure that you have
node.jsinstalled first:brew install nodejs
Code
Use it in your node.js project:
npm install node-llama-cpp
import {getLlama, resolveModelFile, LlamaChatSession} from "node-llama-cpp";
const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf";
const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: await resolveModelFile(modelUri)
});
const context = await model.createContext();
const session = new LlamaChatSession({
    contextSequence: context.getSequence()
});
const q1 = "Hi there, how are you?";
console.log("User: " + q1);
const a1 = await session.prompt(q1);
console.log("AI: " + a1);
Read the getting started guide to quickly scaffold a new
node-llama-cppproject
Customize inference options
Set Harmoy options using HarmonyChatWrapper:
import {
    getLlama, resolveModelFile, LlamaChatSession, HarmonyChatWrapper,
    defineChatSessionFunction
} from "node-llama-cpp";
const modelUri = "hf:giladgd/gpt-oss-20b-GGUF/gpt-oss-20b.MXFP4.gguf";
const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: await resolveModelFile(modelUri)
});
const context = await model.createContext();
const session = new LlamaChatSession({
    contextSequence: context.getSequence(),
    chatWrapper: new HarmonyChatWrapper({
        modelIdentity: "You are ChatGPT, a large language model trained by OpenAI.",
        reasoningEffort: "high"
    })
});
const functions = {
    getCurrentWeather: defineChatSessionFunction({
        description: "Gets the current weather in the provided location.",
        params: {
            type: "object",
            properties: {
                location: {
                    type: "string",
                    description: "The city and state, e.g. San Francisco, CA"
                },
                format: {
                    enum: ["celsius", "fahrenheit"]
                }
            }
        },
        handler({location, format}) {
            console.log(`Getting current weather for "${location}" in ${format}`);
            return {
                // simulate a weather API response
                temperature: format === "celsius" ? 20 : 68,
                format
            };
        }
    })
};
const q1 = "What is the weather like in SF?";
console.log("User: " + q1);
const a1 = await session.prompt(q1, {functions});
console.log("AI: " + a1);
Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
brew install llama.cpp
CLI
llama-cli --hf-repo giladgd/gpt-oss-20b-GGUF --hf-file gpt-oss-20b.MXFP4.gguf -p "The meaning to life and the universe is"
Server
llama-server --hf-repo giladgd/gpt-oss-20b-GGUF --hf-file gpt-oss-20b.MXFP4.gguf -c 2048
- Downloads last month
- 1,093
							Hardware compatibility
						Log In
								
								to view the estimation
16-bit
Model tree for giladgd/gpt-oss-20b-GGUF
Base model
openai/gpt-oss-20b