Instructions to use saadxsalman/Q-SS-0.5B-Reasoning-Math with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use saadxsalman/Q-SS-0.5B-Reasoning-Math with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="saadxsalman/Q-SS-0.5B-Reasoning-Math") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("saadxsalman/Q-SS-0.5B-Reasoning-Math") model = AutoModelForCausalLM.from_pretrained("saadxsalman/Q-SS-0.5B-Reasoning-Math") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use saadxsalman/Q-SS-0.5B-Reasoning-Math with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "saadxsalman/Q-SS-0.5B-Reasoning-Math" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saadxsalman/Q-SS-0.5B-Reasoning-Math", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/saadxsalman/Q-SS-0.5B-Reasoning-Math
- SGLang
How to use saadxsalman/Q-SS-0.5B-Reasoning-Math with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "saadxsalman/Q-SS-0.5B-Reasoning-Math" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saadxsalman/Q-SS-0.5B-Reasoning-Math", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "saadxsalman/Q-SS-0.5B-Reasoning-Math" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saadxsalman/Q-SS-0.5B-Reasoning-Math", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use saadxsalman/Q-SS-0.5B-Reasoning-Math with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for saadxsalman/Q-SS-0.5B-Reasoning-Math to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for saadxsalman/Q-SS-0.5B-Reasoning-Math to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for saadxsalman/Q-SS-0.5B-Reasoning-Math to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="saadxsalman/Q-SS-0.5B-Reasoning-Math", max_seq_length=2048, ) - Docker Model Runner
How to use saadxsalman/Q-SS-0.5B-Reasoning-Math with Docker Model Runner:
docker model run hf.co/saadxsalman/Q-SS-0.5B-Reasoning-Math
Q-SS-0.5B-Reasoning-Math
A compact, fast, and structured mathematical reasoning model — built to think before it answers.
Q-SS-0.5B-Reasoning-Math is a fine-tuned version of Qwen/Qwen2.5-0.5B-Instruct, trained using Group Relative Policy Optimization (GRPO) reinforcement learning — the same technique behind DeepSeek-R1. The model is designed to reason explicitly and transparently through mathematical problems before producing a clean, parseable final answer.
💾 Looking for the lightweight CPU version? See Q-SS-0.5B-Reasoning-Math-GGUF for the Q4_K_M quantized model (~300MB).
✨ Highlights
- 🧠 Thinks out loud — explicit step-by-step reasoning inside
<thought>tags before every answer - 🎯 Clean structured output — final answer always isolated in
<answer>tags, trivial to parse - 🔁 RL-trained — learned through reward signals, not just imitation
- 🔧 Fine-tunable — full FP16 weights, ready for further training or fine-tuning
- 🔓 Apache 2.0 — free for personal and commercial use
📋 Model Details
| Property | Details |
|---|---|
| Model Name | Q-SS-0.5B-Reasoning-Math |
| Base Model | Qwen/Qwen2.5-0.5B-Instruct |
| Parameters | 500M |
| Training Method | SFT Warm-up + GRPO Reinforcement Learning |
| Trained On | GSM8K + OpenR1-Math-220k |
| Precision | FP16 (merged, no adapter needed) |
| License | Apache 2.0 |
| Developer | Saad Salman |
💬 Output Format
Every response follows this strict structure:
<thought>
[Step-by-step reasoning and calculations]
</thought>
<answer>
[Final numerical answer only]
</answer>
🚀 Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "saadxsalman/Q-SS-0.5B-Reasoning-Math"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype = torch.float16,
device_map = "auto",
)
SYSTEM_PROMPT = \"\"\"You are a mathematical reasoning engine.
Solve the problem step-by-step inside <thought> tags, then give ONLY the
final numerical or LaTeX result inside <answer> tags.
<thought>
[Your internal reasoning and calculations here]
</thought>
<answer>
[Final answer only]
</answer>\"\"\"
def solve(problem):
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": problem},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True,
return_tensors = "pt",
).to(model.device)
with torch.no_grad():
outputs = model.generate(
input_ids = inputs,
max_new_tokens = 384,
temperature = 0.1,
do_sample = True,
pad_token_id = tokenizer.eos_token_id,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
if "<answer>" in response:
return response.split("<answer>")[-1].split("</answer>")[0].strip()
return response
print(solve("Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?"))
# Output: 42
📝 Example Outputs
Problem: Janet has 3 cats. Each cat eats 2 cans of food per day. How many cans does she need for 7 days?
<thought>
Each cat eats 2 cans per day.
Janet has 3 cats, so they eat 3 × 2 = 6 cans per day together.
For 7 days: 6 × 7 = 42 cans total.
</thought>
<answer>
42
</answer>
Problem: Tom has $50. He buys a book for $12 and a pen for $3. How much money does he have left?
<thought>
Tom starts with $50.
He spends $12 on a book and $3 on a pen.
Total spent: 12 + 3 = $15.
Money remaining: 50 - 15 = $35.
</thought>
<answer>
35
</answer>
✅ What It's Good At
| Problem Type | Support |
|---|---|
| Basic arithmetic | ✅ Reliable |
| Multi-step word problems | ✅ Reliable |
| Problems with units and currency | ✅ Reliable |
| Basic algebra | ⚠️ Partial |
| Competition math (AMC/AIME) | ❌ Beyond capacity |
📦 Related Models
| Repo | Format | Size | Best For |
|---|---|---|---|
| Q-SS-0.5B-Reasoning-Math | FP16 | ~988MB | GPU inference & further fine-tuning |
| Q-SS-0.5B-Reasoning-Math-GGUF | Q4_K_M | ~300MB | Local CPU inference |
⚠️ Limitations
- Optimized for English language math problems only
- Complex abstract reasoning, geometry, and calculus are beyond reliable capacity at 0.5B scale
- Always verify critical calculations — the model may occasionally produce confident but incorrect answers
🙏 Acknowledgements
- Unsloth — efficient fine-tuning framework
- Qwen Team — Qwen2.5-0.5B-Instruct base model
- HuggingFace TRL — GRPO implementation
- OpenR1 — OpenR1-Math-220k dataset
- OpenAI — GSM8K dataset
📄 Citation
@misc{qss-reasoning-math-2025,
author = {Saad Salman},
title = {Q-SS-0.5B-Reasoning-Math},
year = {2025},
publisher = {HuggingFace},
howpublished = {\\url{https://huggingface.co/saadxsalman/Q-SS-0.5B-Reasoning-Math}},
}
- Downloads last month
- 64