Instructions to use dphn/Dolphin3.0-R1-Mistral-24B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dphn/Dolphin3.0-R1-Mistral-24B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dphn/Dolphin3.0-R1-Mistral-24B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("dphn/Dolphin3.0-R1-Mistral-24B") model = AutoModelForCausalLM.from_pretrained("dphn/Dolphin3.0-R1-Mistral-24B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use dphn/Dolphin3.0-R1-Mistral-24B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dphn/Dolphin3.0-R1-Mistral-24B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dphn/Dolphin3.0-R1-Mistral-24B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dphn/Dolphin3.0-R1-Mistral-24B
- SGLang
How to use dphn/Dolphin3.0-R1-Mistral-24B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dphn/Dolphin3.0-R1-Mistral-24B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dphn/Dolphin3.0-R1-Mistral-24B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dphn/Dolphin3.0-R1-Mistral-24B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dphn/Dolphin3.0-R1-Mistral-24B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use dphn/Dolphin3.0-R1-Mistral-24B with Docker Model Runner:
docker model run hf.co/dphn/Dolphin3.0-R1-Mistral-24B
Modelfile doesn't enable tool calling?
Eric,
I'm attempting to use Dolphin3.0-R1-Mistral-24B:Q8_0 via ollama v0.6.8 from Zed v0.185.13. Why? Agentic Editing (watch the video π)
However, Zed reports that the model doesn't support tool calling ("Tools Unsupported"). It determines this by sending a POST to /api/show ollama openai endpoint. It then searches capabilities from the returned JSON, looking for 'tool'. I confirmed via curl that 'tool' is not listed under capabilities.
Ollama code is much harder to decipher. :-/. It seems I need to add tool calling to the template in the Modelfile. I found where the Hermes dataset you used defines the tool calling prompt, but it declares the tools in the system prompt. I pulled https://ollama.com/thewindmom/hermes-3-llama-3.1-8b, but its Modelfile doesn't include tool calling, and the example in its README lists the tools in the system prompt...
So, do I understand this wrong? I thought the client was supposed to pass a list of available tools to the model and the model would return which tools to call with which arguments. But the configurations I'm finding aren't lining up with that idea?
I'd much prefer to submit a PR, but I don't even think I understand the intent enough to propose a solution that's in the ball field
You lost me at ollama
Here's the thing
Ollama has their own scheme for chat templates that's different than the hugging face chat template
So anytime you use ollama you are trusting in whoever published that model to ollama, to have correctly implement the ollama chat template for the model.
You are also trusting that nothing has changed in the models chat template since the time it was published to ollama
To help you out: if you wanna effectively use ollama you simply need to become comfortable with manipulating Modelfiles and ollama's special chat template.
I could republish all the dolphin models with correct chat template - and I will - when I have 3 hours. In the mean time you will need to update the chat template yourself. (Hint: use qwen2.5-coder's template)