Multilingual EAGLE-3 Draft Head for Qwen3-8B (Hindi / Gujarati / English) — Research Preview

An EAGLE-3 speculative-decoding draft head for Qwen/Qwen3-8B, trained to recover the acceptance length (τ) that public English-only EAGLE-3 heads lose on Indic languages. Pair it with Qwen3-8B in SGLang for lossless faster generation on Hindi and Gujarati.

⚠️ Research preview / proof-of-concept — not a production-tuned head. It was trained on a small (~2,100-example) FLORES-derived dataset. It recovers Indic acceptance but regresses on English and carries a training-domain bias. Please read the Limitations before use. To our knowledge this is the first publicly released Indic EAGLE-3 head; it accompanies the study described below.

Results — acceptance length τ

Config steps=3, topk=1, draft_tokens=4, temperature 0, 50 parallel prompts/language. τ = mean accepted tokens per verification step (higher = faster). EAGLE-3 is lossless — outputs are identical to standard decoding; only speed changes.

vs. the public English EAGLE-3 head, on FLORES-200 prompts:

language	English head	this head
English	2.37	1.47 (1.40 ± 0.09 across 3 seeds)
Hindi	1.36	1.86 ± 0.20
Gujarati	1.07	2.16 ± 0.29

Held-out, out-of-domain (Aya instruction prompts) — the recovery generalizes: Gujarati 1.08 → 2.31, Hindi 1.40 → 1.92 (English head is domain-robust, confirming the comparison is fair).

Why the English head fails on Indic (mechanism)

EAGLE-3 heads emit over a reduced 32k "draft vocabulary" chosen by token frequency. An English-trained head's 32k excludes ~half of all Hindi/Gujarati tokens (it covers only ~~50% / ~46%), so it literally cannot propose them → acceptance collapses toward 1. This head rebuilds the draft vocab from multilingual data (~~100% Indic coverage). Across 8 languages, τ correlates with draft-vocab coverage (Pearson r = +0.95) and inversely with tokenization inflation (r = −0.87).

Usage (SGLang)

python -m sglang.launch_server \
  --model Qwen/Qwen3-8B \
  --speculative-algorithm EAGLE3 \
  --speculative-draft-model-path SwitchXDDD/multilingual-eagle3-qwen3-8b \
  --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 \
  --dtype bfloat16

Training

Target: Qwen/Qwen3-8B (frozen). Framework: SpecForge, EAGLE-3 online mode. Draft config: qwen3-8b-eagle3.json (1-layer LlamaForCausalLMEagle3, draft_vocab_size=32000).
Data: ~2,100 conversations = Qwen3-8B's own responses to FLORES-200 prompts — 700 Hindi + 700 Gujarati + 700 English. Target-regenerated so the draft matches the target's distribution.
Recipe: 5 epochs, lr 1e-4, max-length 4096, bf16, 1× H100.

Limitations (please read)

English regression (2.37 → ~1.40). A same-recipe English-only control also reaches ~1.45, so this is limited/narrow English training data, not multilingual interference — but the head is still worse at English than the off-the-shelf head. Mitigation: mix in diverse English (e.g. ShareGPT) when training your own.
Training-domain bias: trained on FLORES (wiki-news). The held-out Aya results above show the recovery largely holds, but expect some domain sensitivity.
Single seed released: seed-to-seed τ varies (Gujarati ± 0.29 over 3 seeds). This is one representative run.
Small dataset, not quality/safety-tuned — a proof-of-concept, not a maximally-optimized head.
Lossless: it does not change model outputs, only decoding speed.

License & provenance

Weights released under Apache-2.0 (consistent with Qwen3 and SpecForge). Training prompts are derived from FLORES-200 (CC-BY-SA-4.0); responses generated by Qwen3-8B (Apache-2.0). Please retain attribution.

Citation

If you use this head, please cite EAGLE-3, SpecForge, Qwen3, and FLORES-200:

Li et al., EAGLE-3 (NeurIPS 2025). SGLang team, SpecForge. Qwen team, Qwen3. NLLB team, FLORES-200 / No Language Left Behind.
This work: Cross-Lingual EAGLE-3 for Indic Languages (link TBD).

Companion 32B result: the same degradation→recovery pattern replicates at Qwen3-32B (Gujarati 1.03 → 2.47); that head is validated but not yet publicly released (pending a held-out + multi-seed pass).

Downloads last month: 13

Safetensors

Model size

0.4B params

Tensor type

I64

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SwitchXDDD/multilingual-eagle3-qwen3-8b

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(1649)

this model