Multilingual EAGLE-3 Draft Head for Qwen3-8B (Hindi / Gujarati / English) — Research Preview
An EAGLE-3 speculative-decoding draft head for Qwen/Qwen3-8B, trained to recover the acceptance length (τ) that public English-only EAGLE-3 heads lose on Indic languages. Pair it with Qwen3-8B in SGLang for lossless faster generation on Hindi and Gujarati.
⚠️ Research preview / proof-of-concept — not a production-tuned head. It was trained on a small (~2,100-example) FLORES-derived dataset. It recovers Indic acceptance but regresses on English and carries a training-domain bias. Please read the Limitations before use. To our knowledge this is the first publicly released Indic EAGLE-3 head; it accompanies the study described below.
Results — acceptance length τ
Config steps=3, topk=1, draft_tokens=4, temperature 0, 50 parallel prompts/language. τ = mean accepted tokens per verification step (higher = faster). EAGLE-3 is lossless — outputs are identical to standard decoding; only speed changes.
vs. the public English EAGLE-3 head, on FLORES-200 prompts:
| language | English head | this head |
|---|---|---|
| English | 2.37 | 1.47 (1.40 ± 0.09 across 3 seeds) |
| Hindi | 1.36 | 1.86 ± 0.20 |
| Gujarati | 1.07 | 2.16 ± 0.29 |
Held-out, out-of-domain (Aya instruction prompts) — the recovery generalizes: Gujarati 1.08 → 2.31, Hindi 1.40 → 1.92 (English head is domain-robust, confirming the comparison is fair).
Why the English head fails on Indic (mechanism)
EAGLE-3 heads emit over a reduced 32k "draft vocabulary" chosen by token frequency. An English-trained head's 32k excludes ~half of all Hindi/Gujarati tokens (it covers only 50% / ~46%), so it literally cannot propose them → acceptance collapses toward 1. This head rebuilds the draft vocab from multilingual data (100% Indic coverage). Across 8 languages, τ correlates with draft-vocab coverage (Pearson r = +0.95) and inversely with tokenization inflation (r = −0.87).
Usage (SGLang)
python -m sglang.launch_server \
--model Qwen/Qwen3-8B \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path SwitchXDDD/multilingual-eagle3-qwen3-8b \
--speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 \
--dtype bfloat16
Training
- Target:
Qwen/Qwen3-8B(frozen). Framework: SpecForge, EAGLE-3 online mode. Draft config:qwen3-8b-eagle3.json(1-layerLlamaForCausalLMEagle3,draft_vocab_size=32000). - Data: ~2,100 conversations = Qwen3-8B's own responses to FLORES-200 prompts — 700 Hindi + 700 Gujarati + 700 English. Target-regenerated so the draft matches the target's distribution.
- Recipe: 5 epochs, lr 1e-4, max-length 4096, bf16, 1× H100.
Limitations (please read)
- English regression (2.37 → ~1.40). A same-recipe English-only control also reaches ~1.45, so this is limited/narrow English training data, not multilingual interference — but the head is still worse at English than the off-the-shelf head. Mitigation: mix in diverse English (e.g. ShareGPT) when training your own.
- Training-domain bias: trained on FLORES (wiki-news). The held-out Aya results above show the recovery largely holds, but expect some domain sensitivity.
- Single seed released: seed-to-seed τ varies (Gujarati ± 0.29 over 3 seeds). This is one representative run.
- Small dataset, not quality/safety-tuned — a proof-of-concept, not a maximally-optimized head.
- Lossless: it does not change model outputs, only decoding speed.
License & provenance
Weights released under Apache-2.0 (consistent with Qwen3 and SpecForge). Training prompts are derived from FLORES-200 (CC-BY-SA-4.0); responses generated by Qwen3-8B (Apache-2.0). Please retain attribution.
Citation
If you use this head, please cite EAGLE-3, SpecForge, Qwen3, and FLORES-200:
- Li et al., EAGLE-3 (NeurIPS 2025). SGLang team, SpecForge. Qwen team, Qwen3. NLLB team, FLORES-200 / No Language Left Behind.
- This work: Cross-Lingual EAGLE-3 for Indic Languages (link TBD).
Companion 32B result: the same degradation→recovery pattern replicates at Qwen3-32B (Gujarati 1.03 → 2.47); that head is validated but not yet publicly released (pending a held-out + multi-seed pass).
- Downloads last month
- 13