fix(prompt): restore missing system prompts to enhance model capabilities

#1
by yuzhe - opened
.gitattributes CHANGED
@@ -36,4 +36,3 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
36
  data/training_data.json filter=lfs diff=lfs merge=lfs -text
37
  model/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
  figures/model_comparison_chart.png filter=lfs diff=lfs merge=lfs -text
39
- tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
36
  data/training_data.json filter=lfs diff=lfs merge=lfs -text
37
  model/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
  figures/model_comparison_chart.png filter=lfs diff=lfs merge=lfs -text
 
.gitignore DELETED
@@ -1,43 +0,0 @@
1
- # macOS
2
- .DS_Store
3
- **/.DS_Store
4
-
5
- # Python
6
- __pycache__/
7
- *.py[cod]
8
- *$py.class
9
- *.so
10
- .Python
11
- build/
12
- develop-eggs/
13
- dist/
14
- downloads/
15
- eggs/
16
- .eggs/
17
- lib/
18
- lib64/
19
- parts/
20
- sdist/
21
- var/
22
- wheels/
23
- *.egg-info/
24
- .installed.cfg
25
- *.egg
26
-
27
- # Virtual Environment
28
- venv/
29
- ENV/
30
- env/
31
-
32
- # IDE
33
- .vscode/
34
- .idea/
35
- *.swp
36
- *.swo
37
- *~
38
-
39
- # Data artifacts (generated files)
40
- data/prepared_dataset.json
41
-
42
- # Upload scripts (optional, kept for convenience)
43
- # upload_*.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -231,161 +231,6 @@ User: "Search for SOL on Solana" Model:
231
  <start_function_call>call:SEARCH_TOKEN{symbol:"SOL", chain:"solana"}<end_function_call>
232
  ```
233
 
234
- ## Developer Prompt (System Message)
235
-
236
- For optimal performance, use the following developer/system prompt when initializing the model:
237
-
238
- ### Usage Principles (Important)
239
-
240
- **Follow these rules for best results:**
241
-
242
- 1. **Place Once at the Beginning**: Put the developer prompt only once, at the very start of your conversation session
243
- 2. **Do NOT place in user messages**: Never include the developer prompt content in user/assistant messages or tool schemas
244
- 3. **Session-wide persistence**: For multi-turn conversations, keep the same developer prompt at the session start - do not repeat it
245
-
246
- **Correct usage pattern:**
247
- ```json
248
- {
249
- "messages": [
250
- {"role": "developer", "content": "<developer prompt goes here>"},
251
- {"role": "user", "content": "first user query"},
252
- {"role": "assistant", "content": "assistant response"},
253
- {"role": "user", "content": "second user query"}
254
- // No need to repeat developer prompt in subsequent turns
255
- ]
256
- }
257
- ```
258
-
259
- ### Developer Prompt Content
260
-
261
- ```json
262
- {
263
- "messages": [
264
- {"role": "developer", "content": "You are a model that can do function calling with the following functions.\nYou are an on-chain trading assistant.\nYou may use only two tools: SEARCH_TOKEN and EXECUTE_SWAP.\n\nCore policy:\n- Use a tool only when needed.\n- If required fields are missing or ambiguous, ask one concise clarification question first.\n- If the user is just chatting, reply naturally without calling tools.\n- Never fabricate addresses, amounts, balances, prices, or execution results.\n- Never resolve token symbols to contract addresses from memory or static snapshots.\n- Treat ticker symbols as potentially ambiguous and contract addresses as dynamic (can migrate/upgrade).\n- Supported chains are: solana, ethereum, bsc, base.\n If the user asks for an unsupported chain (for example polygon), explain the limitation and ask for a supported chain.\n\nTool-call format (must match exactly):\n<start_function_call>call:TOOL_NAME{\"key\":\"value\",\"amount\":1.23}</end_function_call>\nDo not output XML-style tags such as <function_calls>, <invoke>, or <parameter>.\n\nStrict schema:\n\nSEARCH_TOKEN params\n{\n \"symbol\": \"string, optional\",\n \"address\": \"string, optional\",\n \"keyword\": \"string, optional\",\n \"chain\": \"solana | ethereum | bsc | base, optional\"\n}\nRules:\n- At least one of symbol/address/keyword is required.\n- If the user gives only an address, do address-only lookup (do not guess chain).\n- If user explicitly gives chain, include chain.\n- For symbol/keyword based requests, call SEARCH_TOKEN first before producing a swap call.\n- If lookup may return multiple candidates (same ticker/name), ask the user to confirm the exact token (address or more context).\n\nEXECUTE_SWAP params\n{\n \"inputTokenSymbol\": \"string, required\",\n \"inputTokenCA\": \"string, optional\",\n \"outputTokenCA\": \"string, optional\",\n \"inputTokenAmount\": \"number, optional\",\n \"inputTokenPercentage\": \"number in [0,1], optional\",\n \"outputTokenAmount\": \"number, optional\"\n}\nRules:\n- inputTokenAmount and inputTokenPercentage are mutually exclusive.\n- Convert 30% to inputTokenPercentage=0.3.\n- If both amount and percentage are provided, ask the user to choose one.\n- If outputTokenCA is unknown, call SEARCH_TOKEN first and use the returned result.\n- If user already provides output token address explicitly, you may call EXECUTE_SWAP directly.\n- If lookup returns multiple candidates or low-confidence candidates, ask a clarification question; do not guess.\n\nLanguage:\n- Support both Chinese and English.\n- Reply in the same language as the user unless they ask otherwise."},
265
- {"role": "user", "content": "<user query goes here>"}
266
- ]
267
- }
268
- ```
269
-
270
- **Usage Example (Python/Transformers):**
271
-
272
- ```python
273
- from transformers import AutoModelForCausalLM, AutoProcessor
274
-
275
- model_path = "DMindAI/DMind-3-nano"
276
-
277
- # Load model and processor (processor combines tokenizer and tool handling)
278
- model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
279
- processor = AutoProcessor.from_pretrained(model_path, device_map="auto")
280
-
281
- # Define tool schemas (must match training format)
282
- tools = [
283
- {
284
- "name": "SEARCH_TOKEN",
285
- "description": "Search for a cryptocurrency token on-chain to retrieve its metadata or address.",
286
- "parameters": {
287
- "type": "object",
288
- "properties": {
289
- "symbol": {"type": "string", "description": "The ticker symbol of the token (e.g., 'SOL', 'USDC')."},
290
- "address": {"type": "string", "description": "The specific contract address (CA) of the token, if known."},
291
- "chain": {"type": "string", "enum": ["solana", "ethereum", "bsc", "base"], "description": "The target blockchain network."},
292
- "keyword": {"type": "string", "description": "General search keywords (e.g., project name) if symbol/address are unclear."}
293
- },
294
- "required": []
295
- }
296
- },
297
- {
298
- "name": "EXECUTE_SWAP",
299
- "description": "Propose a token swap transaction.",
300
- "parameters": {
301
- "type": "object",
302
- "properties": {
303
- "inputTokenSymbol": {"type": "string", "description": "Symbol of the token being sold (e.g., 'SOL')."},
304
- "inputTokenCA": {"type": "string", "description": "Contract address of the token being sold."},
305
- "outputTokenCA": {"type": "string", "description": "Contract address of the token being bought."},
306
- "inputTokenAmount": {"type": "number", "description": "Absolute amount of input token to swap."},
307
- "inputTokenPercentage": {"type": "number", "description": "Percentage of balance to swap (0.0 to 1.0)."},
308
- "outputTokenAmount": {"type": "number", "description": "Minimum amount of output token expected."}
309
- },
310
- "required": ["inputTokenSymbol"]
311
- }
312
- }
313
- ]
314
-
315
- # Prepare messages with developer prompt (CRITICAL: must be first message)
316
- developer_prompt = """You are a model that can do function calling with the following functions.
317
- You are an on-chain trading assistant.
318
- You may use only two tools: SEARCH_TOKEN and EXECUTE_SWAP.
319
-
320
- Core policy:
321
- - Use a tool only when needed.
322
- - If required fields are missing or ambiguous, ask one concise clarification question first.
323
- - If the user is just chatting, reply naturally without calling tools.
324
- - Never fabricate addresses, amounts, balances, prices, or execution results.
325
- - Never resolve token symbols to contract addresses from memory or static snapshots.
326
- - Treat ticker symbols as potentially ambiguous and contract addresses as dynamic (can migrate/upgrade).
327
- - Supported chains are: solana, ethereum, bsc, base.
328
- If the user asks for an unsupported chain (for example polygon), explain the limitation and ask for a supported chain.
329
-
330
- Tool-call format (must match exactly):
331
- <start_function_call>call:TOOL_NAME{\"key\":\"value\",\"amount\":1.23}</end_function_call>
332
- Do not output XML-style tags such as <function_calls>, <invoke>, or <parameter>.
333
-
334
- Strict schema:
335
-
336
- SEARCH_TOKEN params
337
- {
338
- \"symbol\": \"string, optional\",
339
- \"address\": \"string, optional\",
340
- \"keyword\": \"string, optional\",
341
- \"chain\": \"solana | ethereum | bsc | base, optional\"
342
- }
343
- Rules:
344
- - At least one of symbol/address/keyword is required.
345
- - If the user gives only an address, do address-only lookup (do not guess chain).
346
- - If user explicitly gives chain, include chain.
347
- - For symbol/keyword based requests, call SEARCH_TOKEN first before producing a swap call.
348
- - If lookup may return multiple candidates (same ticker/name), ask the user to confirm the exact token (address or more context).
349
-
350
- EXECUTE_SWAP params
351
- {
352
- \"inputTokenSymbol\": \"string, required\",
353
- \"inputTokenCA\": \"string, optional\",
354
- \"outputTokenCA\": \"string, optional\",
355
- \"inputTokenAmount\": \"number, optional\",
356
- \"inputTokenPercentage\": \"number in [0,1], optional\",
357
- \"outputTokenAmount\": \"number, optional\"
358
- }
359
- Rules:
360
- - inputTokenAmount and inputTokenPercentage are mutually exclusive.
361
- - Convert 30% to inputTokenPercentage=0.3.
362
- - If both amount and percentage are provided, ask the user to choose one.
363
- - If outputTokenCA is unknown, call SEARCH_TOKEN first and use the returned result.
364
- - If user already provides output token address explicitly, you may call EXECUTE_SWAP directly.
365
- - If lookup returns multiple candidates or low-confidence candidates, ask a clarification question; do not guess.
366
-
367
- Language:
368
- - Support both Chinese and English.
369
- - Reply in the same language as the user unless they ask otherwise."""
370
-
371
- messages = [
372
- {"role": "developer", "content": developer_prompt},
373
- {"role": "user", "content": "在base查BTC地址"}
374
- ]
375
-
376
- # Generate with processor (handles tools automatically)
377
- inputs = processor.apply_chat_template(
378
- messages,
379
- tools=tools,
380
- add_generation_prompt=True,
381
- return_dict=True,
382
- return_tensors="pt"
383
- ).to(model.device)
384
-
385
- outputs = model.generate(**inputs, max_new_tokens=256)
386
- response = processor.decode(outputs[0], skip_special_tokens=True)
387
- print(response)
388
- ```
389
 
390
  ## License & Governance
391
 
 
231
  <start_function_call>call:SEARCH_TOKEN{symbol:"SOL", chain:"solana"}<end_function_call>
232
  ```
233
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
 
235
  ## License & Governance
236
 
model/added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "<end_of_image>": 262145,
3
+ "<image_soft_token>": 262144
4
+ }
model/config.json CHANGED
@@ -8,10 +8,7 @@
8
  "attn_logit_softcapping": null,
9
  "bos_token_id": 2,
10
  "dtype": "bfloat16",
11
- "eos_token_id": [
12
- 1,
13
- 50
14
- ],
15
  "final_logit_softcapping": null,
16
  "head_dim": 256,
17
  "hidden_activation": "gelu_pytorch_tanh",
@@ -46,19 +43,11 @@
46
  "pad_token_id": 0,
47
  "query_pre_attn_scalar": 256,
48
  "rms_norm_eps": 1e-06,
49
- "rope_parameters": {
50
- "full_attention": {
51
- "rope_theta": 1000000.0,
52
- "rope_type": "default"
53
- },
54
- "sliding_attention": {
55
- "rope_theta": 10000.0,
56
- "rope_type": "default"
57
- }
58
- },
59
  "sliding_window": 512,
60
- "tie_word_embeddings": true,
61
- "transformers_version": "5.2.0",
62
  "use_bidirectional_attention": false,
63
  "use_cache": true,
64
  "vocab_size": 262144
 
8
  "attn_logit_softcapping": null,
9
  "bos_token_id": 2,
10
  "dtype": "bfloat16",
11
+ "eos_token_id": 1,
 
 
 
12
  "final_logit_softcapping": null,
13
  "head_dim": 256,
14
  "hidden_activation": "gelu_pytorch_tanh",
 
43
  "pad_token_id": 0,
44
  "query_pre_attn_scalar": 256,
45
  "rms_norm_eps": 1e-06,
46
+ "rope_local_base_freq": 10000.0,
47
+ "rope_scaling": null,
48
+ "rope_theta": 1000000.0,
 
 
 
 
 
 
 
49
  "sliding_window": 512,
50
+ "transformers_version": "4.57.3",
 
51
  "use_bidirectional_attention": false,
52
  "use_cache": true,
53
  "vocab_size": 262144
model/generation_config.json CHANGED
@@ -1,12 +1,15 @@
1
  {
 
2
  "cache_implementation": "hybrid",
3
  "do_sample": true,
4
  "eos_token_id": [
 
5
  1,
6
  50,
7
  106
8
  ],
 
9
  "top_k": 64,
10
  "top_p": 0.95,
11
- "transformers_version": "5.2.0"
12
  }
 
1
  {
2
+ "bos_token_id": 2,
3
  "cache_implementation": "hybrid",
4
  "do_sample": true,
5
  "eos_token_id": [
6
+ 1,
7
  1,
8
  50,
9
  106
10
  ],
11
+ "pad_token_id": 0,
12
  "top_k": 64,
13
  "top_p": 0.95,
14
+ "transformers_version": "4.57.3"
15
  }
model/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e6c267f7824893e6721def034cac21c34a822089a91b3c9dba69c07f2db0cdae
3
  size 536223056
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:906c2a781360ac00096c1556249908d030db0b097dedce080ec64b51ab16a941
3
  size 536223056
model/special_tokens_map.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "boi_token": "<start_of_image>",
3
+ "bos_token": {
4
+ "content": "<bos>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false
9
+ },
10
+ "eoi_token": "<end_of_image>",
11
+ "eos_token": {
12
+ "content": "<eos>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false
17
+ },
18
+ "image_token": "<image_soft_token>",
19
+ "pad_token": {
20
+ "content": "<pad>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false
25
+ },
26
+ "sfr_token": "<start_function_response>",
27
+ "unk_token": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ }
34
+ }
model/tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e3655797f9d732b7dc08b4225200697af8e37d94b74711d9b1d8166feb953578
3
- size 33384774
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6b09a0b4a803ad453063ca4bb49a784540e8120004e2450e025df2b27d41fb2
3
+ size 33384899
model/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa009fcbc3589a9904d30d04834094fea4653c2ac6d2de2cd1262d4f7a50ceb3
3
+ size 4689144
model/tokenizer_config.json CHANGED
The diff for this file is too large to render. See raw diff
 
model/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:448c9ce9426a53c26348d67bf18ee13527c25dfd435756784e9bfef763495cb8
3
+ size 6417