A newer version of this model is available: diffutron/DiffutronLM-0.3B-Instruct

DiffutronLM-0.3B-1st-Stage

DiffutronLM-0.3B-1st-Stage is an intermediate checkpoint of the Diffutron series, a parameter-efficient, Masked Diffusion Language Model (MDLM) designed for the Turkish language.

This specific model represents the completion of the first stage of instruction fine-tuning. It has been trained to grasp the fundamentals of instruction-following in Turkish, serving as a robust foundation before more complex, domain-specific specialization (which is handled in the final Instruct model).

πŸ“Œ Model Details

  • Model Type: Masked Diffusion Language Model (MDLM)
  • Base Architecture: jhu-clsp/mmBERT-base (Multilingual Encoder)
  • Language: Turkish
  • Parameter Count: 307M (0.3B)
  • Context Length: 256 tokens
  • Training Libraries: dllm, PyTorch
  • Status: Intermediate Checkpoint (Stage 1 SFT)

πŸš€ Training Pipeline for This Checkpoint

Diffutron replaces traditional next-token autoregressive generation with a discrete diffusion process, generating text by iteratively refining sequences in parallel. To reach this checkpoint, the model underwent two main phases:

1. Continual Pre-training (CPT)

The multilingual backbone was adapted to Turkish using a high-rank LoRA strategy (r=256, Ξ±=256) on ~2 million sequences sourced from Havadis, Temiz-OSCAR, and Turkish Wikipedia. This effectively modeled Turkish morphological nuances without catastrophic forgetting.

2. Stage 1: Foundational Instruction Tuning

Following CPT, the model underwent full supervised fine-tuning (SFT) to align it with human intent.

  • Dataset: metunlp/LlamaTurk-Instruction-Set
  • Objective: Introduce the model to a broad range of general instructions and establish basic response coherence.
  • Hyperparameters: 20 Epochs, Batch Size 16, AdamW optimizer (lr=1e-4), Max Sequence Length 256.

(Note: For the most advanced instruction-following capabilities, including complex reasoning, we recommend using the final DiffutronLM-0.3B-Instruct model, which includes a second stage of tuning on InstrucTurca.)

πŸ“Š Evaluation Results

Despite being an intermediate checkpoint, the 1st-Stage model demonstrates highly competitive performance against much larger autoregressive baselines on the CETVEL Benchmark Suite.

Benchmark Diffutron-1st (0.3B)-Stage Diffutron-2nd-Stage (0.3B) TURNA (1.1B) Kumru (2B) Kanarya (2B) Llama-3.2 (3B) Trendyol (7B) Aya-101 (13B)
Belebele_TR 22.22 27.00 22.56 29.00 28.11 55.78 36.22 22.89
EXAMS_TR 25.95 27.74 23.66 30.03 30.03 26.21 28.50 22.90
IronyTR 50.67 52.00 48.33 51.00 50.00 50.17 50.00 52.17
News_Cat 23.20 32.40 32.80 26.40 66.80 64.00 81.20 20.00
MNLI_TR 33.29 32.81 34.94 36.42 33.40 34.76 35.19 27.90
STS_TR 17.77 18.78 14.21 11.75 12.91 12.91 15.52 16.97
XCOPA_TR 53.80 52.00 55.80 54.00 64.20 54.60 61.00 59.60
Average 32.41 34.68 33.19 34.09 40.78 42.63 43.95 31.78

πŸ’» Usage

Inference requires generating text via a discrete diffusion process rather than causal next-token prediction. We recommend using the dllm library.

Recommended Generation Parameters:

  • Steps: 64 to 128
  • Temperature: 0.1
  • Block Length: 32
  • Repetition Penalty: 1.2
  • Remask Strategy: low_conf

⚠️ Limitations

  • Intermediate State: This model has not undergone the final specialization phase and may struggle with highly complex or multi-turn instructions compared to the final Instruct model.
  • Context Window: Restricted to a 256-token context window.
  • Multilingual Backbone: Inherits representations from a multilingual encoder, not a natively trained Turkish foundation model.

πŸ“ Citation

@misc{diffutron2026,
  author = {Kocabay, Şuayp Talha and Akkuş, Talha Rüzgar},
  title = {Diffutron: A Masked Diffusion Language Model for Turkish Language},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{[https://huggingface.co/collections/diffutron/diffutronlm](https://huggingface.co/collections/diffutron/diffutronlm)}}
}
Downloads last month
19
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for diffutron/DiffutronLM-0.3B-1st-Stage

Finetuned
(1)
this model
Finetunes
1 model

Dataset used to train diffutron/DiffutronLM-0.3B-1st-Stage

Collection including diffutron/DiffutronLM-0.3B-1st-Stage