DMindAI/DMind-3-nano · fix(prompt): restore missing system prompts to enhance model capabilities

fix(prompt): restore missing system prompts to enhance model capabilities

by yuzhe - opened 14 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+51392

-226

Files changed (12) hide show

.gitattributes +0 -1
.gitignore +0 -43
README.md +0 -155
model/added_tokens.json +4 -0
model/config.json +5 -16
model/generation_config.json +4 -1
model/model.safetensors +1 -1
model/special_tokens_map.json +34 -0
model/tokenizer.json +2 -2
model/tokenizer.model +3 -0
model/tokenizer_config.json +0 -0
model/training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -36,4 +36,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 data/training_data.json filter=lfs diff=lfs merge=lfs -text
 model/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 figures/model_comparison_chart.png filter=lfs diff=lfs merge=lfs -text
-tokenizer.json filter=lfs diff=lfs merge=lfs -text

 data/training_data.json filter=lfs diff=lfs merge=lfs -text
 model/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 figures/model_comparison_chart.png filter=lfs diff=lfs merge=lfs -text

.gitignore DELETED Viewed

@@ -1,43 +0,0 @@
-# macOS
-.DS_Store
-**/.DS_Store
-# Python
-__pycache__/
-*.py[cod]
-*$py.class
-*.so
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-# Virtual Environment
-venv/
-ENV/
-env/
-# IDE
-.vscode/
-.idea/
-*.swp
-*.swo
-*~
-# Data artifacts (generated files)
-data/prepared_dataset.json
-# Upload scripts (optional, kept for convenience)
-# upload_*.py

README.md CHANGED Viewed

@@ -231,161 +231,6 @@ User: "Search for SOL on Solana" Model:
 <start_function_call>call:SEARCH_TOKEN{symbol:"SOL", chain:"solana"}<end_function_call>
 ```
-## Developer Prompt (System Message)
-For optimal performance, use the following developer/system prompt when initializing the model:
-### Usage Principles (Important)
-**Follow these rules for best results:**
-1. **Place Once at the Beginning**: Put the developer prompt only once, at the very start of your conversation session
-2. **Do NOT place in user messages**: Never include the developer prompt content in user/assistant messages or tool schemas
-3. **Session-wide persistence**: For multi-turn conversations, keep the same developer prompt at the session start - do not repeat it
-**Correct usage pattern:**
-```json
-{
-  "messages": [
-    {"role": "developer", "content": "<developer prompt goes here>"},
-    {"role": "user", "content": "first user query"},
-    {"role": "assistant", "content": "assistant response"},
-    {"role": "user", "content": "second user query"}
-    // No need to repeat developer prompt in subsequent turns
-  ]
-}
-```
-### Developer Prompt Content
-```json
-{
-  "messages": [
-    {"role": "developer", "content": "You are a model that can do function calling with the following functions.\nYou are an on-chain trading assistant.\nYou may use only two tools: SEARCH_TOKEN and EXECUTE_SWAP.\n\nCore policy:\n- Use a tool only when needed.\n- If required fields are missing or ambiguous, ask one concise clarification question first.\n- If the user is just chatting, reply naturally without calling tools.\n- Never fabricate addresses, amounts, balances, prices, or execution results.\n- Never resolve token symbols to contract addresses from memory or static snapshots.\n- Treat ticker symbols as potentially ambiguous and contract addresses as dynamic (can migrate/upgrade).\n- Supported chains are: solana, ethereum, bsc, base.\n  If the user asks for an unsupported chain (for example polygon), explain the limitation and ask for a supported chain.\n\nTool-call format (must match exactly):\n<start_function_call>call:TOOL_NAME{\"key\":\"value\",\"amount\":1.23}</end_function_call>\nDo not output XML-style tags such as <function_calls>, <invoke>, or <parameter>.\n\nStrict schema:\n\nSEARCH_TOKEN params\n{\n  \"symbol\": \"string, optional\",\n  \"address\": \"string, optional\",\n  \"keyword\": \"string, optional\",\n  \"chain\": \"solana | ethereum | bsc | base, optional\"\n}\nRules:\n- At least one of symbol/address/keyword is required.\n- If the user gives only an address, do address-only lookup (do not guess chain).\n- If user explicitly gives chain, include chain.\n- For symbol/keyword based requests, call SEARCH_TOKEN first before producing a swap call.\n- If lookup may return multiple candidates (same ticker/name), ask the user to confirm the exact token (address or more context).\n\nEXECUTE_SWAP params\n{\n  \"inputTokenSymbol\": \"string, required\",\n  \"inputTokenCA\": \"string, optional\",\n  \"outputTokenCA\": \"string, optional\",\n  \"inputTokenAmount\": \"number, optional\",\n  \"inputTokenPercentage\": \"number in [0,1], optional\",\n  \"outputTokenAmount\": \"number, optional\"\n}\nRules:\n- inputTokenAmount and inputTokenPercentage are mutually exclusive.\n- Convert 30% to inputTokenPercentage=0.3.\n- If both amount and percentage are provided, ask the user to choose one.\n- If outputTokenCA is unknown, call SEARCH_TOKEN first and use the returned result.\n- If user already provides output token address explicitly, you may call EXECUTE_SWAP directly.\n- If lookup returns multiple candidates or low-confidence candidates, ask a clarification question; do not guess.\n\nLanguage:\n- Support both Chinese and English.\n- Reply in the same language as the user unless they ask otherwise."},
-    {"role": "user", "content": "<user query goes here>"}
-  ]
-}
-```
-**Usage Example (Python/Transformers):**
-```python
-from transformers import AutoModelForCausalLM, AutoProcessor
-model_path = "DMindAI/DMind-3-nano"
-# Load model and processor (processor combines tokenizer and tool handling)
-model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
-processor = AutoProcessor.from_pretrained(model_path, device_map="auto")
-# Define tool schemas (must match training format)
-tools = [
-    {
-        "name": "SEARCH_TOKEN",
-        "description": "Search for a cryptocurrency token on-chain to retrieve its metadata or address.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "symbol": {"type": "string", "description": "The ticker symbol of the token (e.g., 'SOL', 'USDC')."},
-                "address": {"type": "string", "description": "The specific contract address (CA) of the token, if known."},
-                "chain": {"type": "string", "enum": ["solana", "ethereum", "bsc", "base"], "description": "The target blockchain network."},
-                "keyword": {"type": "string", "description": "General search keywords (e.g., project name) if symbol/address are unclear."}
-            },
-            "required": []
-        }
-    },
-    {
-        "name": "EXECUTE_SWAP",
-        "description": "Propose a token swap transaction.",
-        "parameters": {
-            "type": "object",
-            "properties": {
-                "inputTokenSymbol": {"type": "string", "description": "Symbol of the token being sold (e.g., 'SOL')."},
-                "inputTokenCA": {"type": "string", "description": "Contract address of the token being sold."},
-                "outputTokenCA": {"type": "string", "description": "Contract address of the token being bought."},
-                "inputTokenAmount": {"type": "number", "description": "Absolute amount of input token to swap."},
-                "inputTokenPercentage": {"type": "number", "description": "Percentage of balance to swap (0.0 to 1.0)."},
-                "outputTokenAmount": {"type": "number", "description": "Minimum amount of output token expected."}
-            },
-            "required": ["inputTokenSymbol"]
-        }
-    }
-]
-# Prepare messages with developer prompt (CRITICAL: must be first message)
-developer_prompt = """You are a model that can do function calling with the following functions.
-You are an on-chain trading assistant.
-You may use only two tools: SEARCH_TOKEN and EXECUTE_SWAP.
-Core policy:
-- Use a tool only when needed.
-- If required fields are missing or ambiguous, ask one concise clarification question first.
-- If the user is just chatting, reply naturally without calling tools.
-- Never fabricate addresses, amounts, balances, prices, or execution results.
-- Never resolve token symbols to contract addresses from memory or static snapshots.
-- Treat ticker symbols as potentially ambiguous and contract addresses as dynamic (can migrate/upgrade).
-- Supported chains are: solana, ethereum, bsc, base.
-  If the user asks for an unsupported chain (for example polygon), explain the limitation and ask for a supported chain.
-Tool-call format (must match exactly):
-<start_function_call>call:TOOL_NAME{\"key\":\"value\",\"amount\":1.23}</end_function_call>
-Do not output XML-style tags such as <function_calls>, <invoke>, or <parameter>.
-Strict schema:
-SEARCH_TOKEN params
-{
-  \"symbol\": \"string, optional\",
-  \"address\": \"string, optional\",
-  \"keyword\": \"string, optional\",
-  \"chain\": \"solana | ethereum | bsc | base, optional\"
-}
-Rules:
-- At least one of symbol/address/keyword is required.
-- If the user gives only an address, do address-only lookup (do not guess chain).
-- If user explicitly gives chain, include chain.
-- For symbol/keyword based requests, call SEARCH_TOKEN first before producing a swap call.
-- If lookup may return multiple candidates (same ticker/name), ask the user to confirm the exact token (address or more context).
-EXECUTE_SWAP params
-{
-  \"inputTokenSymbol\": \"string, required\",
-  \"inputTokenCA\": \"string, optional\",
-  \"outputTokenCA\": \"string, optional\",
-  \"inputTokenAmount\": \"number, optional\",
-  \"inputTokenPercentage\": \"number in [0,1], optional\",
-  \"outputTokenAmount\": \"number, optional\"
-}
-Rules:
-- inputTokenAmount and inputTokenPercentage are mutually exclusive.
-- Convert 30% to inputTokenPercentage=0.3.
-- If both amount and percentage are provided, ask the user to choose one.
-- If outputTokenCA is unknown, call SEARCH_TOKEN first and use the returned result.
-- If user already provides output token address explicitly, you may call EXECUTE_SWAP directly.
-- If lookup returns multiple candidates or low-confidence candidates, ask a clarification question; do not guess.
-Language:
-- Support both Chinese and English.
-- Reply in the same language as the user unless they ask otherwise."""
-messages = [
-    {"role": "developer", "content": developer_prompt},
-    {"role": "user", "content": "在base查BTC地址"}
-]
-# Generate with processor (handles tools automatically)
-inputs = processor.apply_chat_template(
-    messages,
-    tools=tools,
-    add_generation_prompt=True,
-    return_dict=True,
-    return_tensors="pt"
-).to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=256)
-response = processor.decode(outputs[0], skip_special_tokens=True)
-print(response)
-```
 ## License & Governance

 <start_function_call>call:SEARCH_TOKEN{symbol:"SOL", chain:"solana"}<end_function_call>
 ```
 ## License & Governance

model/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<end_of_image>": 262145,
+  "<image_soft_token>": 262144
+}

model/config.json CHANGED Viewed

@@ -8,10 +8,7 @@
   "attn_logit_softcapping": null,
   "bos_token_id": 2,
   "dtype": "bfloat16",
-  "eos_token_id": [
-    1,
-    50
-  ],
   "final_logit_softcapping": null,
   "head_dim": 256,
   "hidden_activation": "gelu_pytorch_tanh",
@@ -46,19 +43,11 @@
   "pad_token_id": 0,
   "query_pre_attn_scalar": 256,
   "rms_norm_eps": 1e-06,
-  "rope_parameters": {
-    "full_attention": {
-      "rope_theta": 1000000.0,
-      "rope_type": "default"
-    },
-    "sliding_attention": {
-      "rope_theta": 10000.0,
-      "rope_type": "default"
-    }
-  },
   "sliding_window": 512,
-  "tie_word_embeddings": true,
-  "transformers_version": "5.2.0",
   "use_bidirectional_attention": false,
   "use_cache": true,
   "vocab_size": 262144

   "attn_logit_softcapping": null,
   "bos_token_id": 2,
   "dtype": "bfloat16",
+  "eos_token_id": 1,
   "final_logit_softcapping": null,
   "head_dim": 256,
   "hidden_activation": "gelu_pytorch_tanh",
   "pad_token_id": 0,
   "query_pre_attn_scalar": 256,
   "rms_norm_eps": 1e-06,
+  "rope_local_base_freq": 10000.0,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
   "sliding_window": 512,
+  "transformers_version": "4.57.3",
   "use_bidirectional_attention": false,
   "use_cache": true,
   "vocab_size": 262144

model/generation_config.json CHANGED Viewed

@@ -1,12 +1,15 @@
 {
   "cache_implementation": "hybrid",
   "do_sample": true,
   "eos_token_id": [
     1,
     50,
     106
   ],
   "top_k": 64,
   "top_p": 0.95,
-  "transformers_version": "5.2.0"
 }

 {
+  "bos_token_id": 2,
   "cache_implementation": "hybrid",
   "do_sample": true,
   "eos_token_id": [
+    1,
     1,
     50,
     106
   ],
+  "pad_token_id": 0,
   "top_k": 64,
   "top_p": 0.95,
+  "transformers_version": "4.57.3"
 }

model/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e6c267f7824893e6721def034cac21c34a822089a91b3c9dba69c07f2db0cdae
 size 536223056

 version https://git-lfs.github.com/spec/v1
+oid sha256:906c2a781360ac00096c1556249908d030db0b097dedce080ec64b51ab16a941
 size 536223056

model/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "boi_token": "<start_of_image>",
+  "bos_token": {
+    "content": "<bos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eoi_token": "<end_of_image>",
+  "eos_token": {
+    "content": "<eos>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "image_token": "<image_soft_token>",
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sfr_token": "<start_function_response>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

model/tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e3655797f9d732b7dc08b4225200697af8e37d94b74711d9b1d8166feb953578
-size 33384774

 version https://git-lfs.github.com/spec/v1
+oid sha256:b6b09a0b4a803ad453063ca4bb49a784540e8120004e2450e025df2b27d41fb2
+size 33384899

model/tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa009fcbc3589a9904d30d04834094fea4653c2ac6d2de2cd1262d4f7a50ceb3
+size 4689144

model/tokenizer_config.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

model/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:448c9ce9426a53c26348d67bf18ee13527c25dfd435756784e9bfef763495cb8
+size 6417