Multilingual EAGLE-3 Draft Head for Qwen3-8B (Hindi / Gujarati / English) — Research Preview

An EAGLE-3 speculative-decoding draft head for Qwen/Qwen3-8B, trained to recover the acceptance length (τ) that public English-only EAGLE-3 heads lose on Indic languages. Pair it with Qwen3-8B in SGLang for lossless faster generation on Hindi and Gujarati.

⚠️ Research preview / proof-of-concept — not a production-tuned head. It was trained on a small (~2,100-example) FLORES-derived dataset. It recovers Indic acceptance but regresses on English and carries a training-domain bias. Please read the Limitations before use. To our knowledge this is the first publicly released Indic EAGLE-3 head; it accompanies the study described below.

Results — acceptance length τ

Config steps=3, topk=1, draft_tokens=4, temperature 0, 50 parallel prompts/language. τ = mean accepted tokens per verification step (higher = faster). EAGLE-3 is lossless — outputs are identical to standard decoding; only speed changes.

vs. the public English EAGLE-3 head, on FLORES-200 prompts:

language English head this head
English 2.37 1.47 (1.40 ± 0.09 across 3 seeds)
Hindi 1.36 1.86 ± 0.20
Gujarati 1.07 2.16 ± 0.29

Held-out, out-of-domain (Aya instruction prompts) — the recovery generalizes: Gujarati 1.08 → 2.31, Hindi 1.40 → 1.92 (English head is domain-robust, confirming the comparison is fair).

Why the English head fails on Indic (mechanism)

EAGLE-3 heads emit over a reduced 32k "draft vocabulary" chosen by token frequency. An English-trained head's 32k excludes ~half of all Hindi/Gujarati tokens (it covers only 50% / ~46%), so it literally cannot propose them → acceptance collapses toward 1. This head rebuilds the draft vocab from multilingual data (100% Indic coverage). Across 8 languages, τ correlates with draft-vocab coverage (Pearson r = +0.95) and inversely with tokenization inflation (r = −0.87).

Usage (SGLang)

python -m sglang.launch_server \
  --model Qwen/Qwen3-8B \
  --speculative-algorithm EAGLE3 \
  --speculative-draft-model-path SwitchXDDD/multilingual-eagle3-qwen3-8b \
  --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 \
  --dtype bfloat16

Training

  • Target: Qwen/Qwen3-8B (frozen). Framework: SpecForge, EAGLE-3 online mode. Draft config: qwen3-8b-eagle3.json (1-layer LlamaForCausalLMEagle3, draft_vocab_size=32000).
  • Data: ~2,100 conversations = Qwen3-8B's own responses to FLORES-200 prompts — 700 Hindi + 700 Gujarati + 700 English. Target-regenerated so the draft matches the target's distribution.
  • Recipe: 5 epochs, lr 1e-4, max-length 4096, bf16, 1× H100.

Limitations (please read)

  • English regression (2.37 → ~1.40). A same-recipe English-only control also reaches ~1.45, so this is limited/narrow English training data, not multilingual interference — but the head is still worse at English than the off-the-shelf head. Mitigation: mix in diverse English (e.g. ShareGPT) when training your own.
  • Training-domain bias: trained on FLORES (wiki-news). The held-out Aya results above show the recovery largely holds, but expect some domain sensitivity.
  • Single seed released: seed-to-seed τ varies (Gujarati ± 0.29 over 3 seeds). This is one representative run.
  • Small dataset, not quality/safety-tuned — a proof-of-concept, not a maximally-optimized head.
  • Lossless: it does not change model outputs, only decoding speed.

License & provenance

Weights released under Apache-2.0 (consistent with Qwen3 and SpecForge). Training prompts are derived from FLORES-200 (CC-BY-SA-4.0); responses generated by Qwen3-8B (Apache-2.0). Please retain attribution.

Citation

If you use this head, please cite EAGLE-3, SpecForge, Qwen3, and FLORES-200:

  • Li et al., EAGLE-3 (NeurIPS 2025). SGLang team, SpecForge. Qwen team, Qwen3. NLLB team, FLORES-200 / No Language Left Behind.
  • This work: Cross-Lingual EAGLE-3 for Indic Languages (link TBD).

Companion 32B result: the same degradation→recovery pattern replicates at Qwen3-32B (Gujarati 1.03 → 2.47); that head is validated but not yet publicly released (pending a held-out + multi-seed pass).

Downloads last month
13
Safetensors
Model size
0.4B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SwitchXDDD/multilingual-eagle3-qwen3-8b

Finetuned
Qwen/Qwen3-8B
Finetuned
(1649)
this model