Instructions to use dphn/Dolphin3.0-R1-Mistral-24B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dphn/Dolphin3.0-R1-Mistral-24B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dphn/Dolphin3.0-R1-Mistral-24B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dphn/Dolphin3.0-R1-Mistral-24B")
model = AutoModelForCausalLM.from_pretrained("dphn/Dolphin3.0-R1-Mistral-24B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dphn/Dolphin3.0-R1-Mistral-24B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dphn/Dolphin3.0-R1-Mistral-24B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/Dolphin3.0-R1-Mistral-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dphn/Dolphin3.0-R1-Mistral-24B

SGLang

How to use dphn/Dolphin3.0-R1-Mistral-24B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dphn/Dolphin3.0-R1-Mistral-24B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/Dolphin3.0-R1-Mistral-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dphn/Dolphin3.0-R1-Mistral-24B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dphn/Dolphin3.0-R1-Mistral-24B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dphn/Dolphin3.0-R1-Mistral-24B with Docker Model Runner:
```
docker model run hf.co/dphn/Dolphin3.0-R1-Mistral-24B
```

Modelfile doesn't enable tool calling?

#13

by jac-cbi - opened May 9, 2025

Discussion

jac-cbi

May 9, 2025

Eric,

I'm attempting to use Dolphin3.0-R1-Mistral-24B:Q8_0 via ollama v0.6.8 from Zed v0.185.13. Why? Agentic Editing (watch the video 😎)

However, Zed reports that the model doesn't support tool calling ("Tools Unsupported"). It determines this by sending a POST to /api/show ollama openai endpoint. It then searches capabilities from the returned JSON, looking for 'tool'. I confirmed via curl that 'tool' is not listed under capabilities.

Ollama code is much harder to decipher. :-/. It seems I need to add tool calling to the template in the Modelfile. I found where the Hermes dataset you used defines the tool calling prompt, but it declares the tools in the system prompt. I pulled https://ollama.com/thewindmom/hermes-3-llama-3.1-8b, but its Modelfile doesn't include tool calling, and the example in its README lists the tools in the system prompt...

So, do I understand this wrong? I thought the client was supposed to pass a list of available tools to the model and the model would return which tools to call with which arguments. But the configurations I'm finding aren't lining up with that idea?

I'd much prefer to submit a PR, but I don't even think I understand the intent enough to propose a solution that's in the ball field

ehartford

Dolphin org May 9, 2025

You lost me at ollama

Here's the thing
Ollama has their own scheme for chat templates that's different than the hugging face chat template

So anytime you use ollama you are trusting in whoever published that model to ollama, to have correctly implement the ollama chat template for the model.

You are also trusting that nothing has changed in the models chat template since the time it was published to ollama

To help you out: if you wanna effectively use ollama you simply need to become comfortable with manipulating Modelfiles and ollama's special chat template.

I could republish all the dolphin models with correct chat template - and I will - when I have 3 hours. In the mean time you will need to update the chat template yourself. (Hint: use qwen2.5-coder's template)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment