🤗 Hugging Face | 🤖 ModelScope | 🐙 Experience Link Coming Soon~

Ling-2.5-1T, Inclusive Intelligence, Instant Impact.

Today, we launch Ling-2.5-1T and make it open source.

Thinking models raise the ceiling of intelligence, while instant models expand its reach by balancing efficiency and performance—making AGI not only more powerful, but also more accessible. As the latest flagship instant model in the Ling family, Ling-2.5-1T delivers comprehensive upgrades across model architecture, token efficiency, and preference alignment, designed to bring universally accessible AI to a new level of quality.

Ling-2.5-1T features 1T total parameters (with 63B active parameters). Its pre-training corpus has expanded from 20T to 29T tokens compared to the previous generation. Leveraging an efficient hybrid linear attention architecture and refined data strategy, the model delivers exceptionally high throughput while processing context lengths of up to 1M tokens.
By introducing a composite reward mechanism combining "Correctness" and "Process Redundancy", Ling-2.5-1T further pushes the frontier of efficiency-performance balance in instant models. At comparable token efficiency levels, Ling-2.5-1T’s reasoning capabilities significantly outperform its predecessor, approaching the level of frontier "thinking models" that typically consume ~4x the output tokens.
Through refined alignment strategies—such as bidirectional RL feedback and Agent-based instruction constraint verification—Ling-2.5-1T achieves substantial improvements over the previous generation in preference alignment tasks, including creative writing and instruction following.
Trained with Agentic RL in large-scale high-fidelity interactive environments, Ling-2.5-1T is compatible with mainstream agent platforms such as Claude Code, OpenCode, and OpenClaw. It achieves leading open-source performance on the general tool-calling benchmark, BFCL-V4.

Evaluation

We have conducted a comprehensive evaluation of Ling-2.5-1T across multiple authoritative benchmarks, covering domains such as knowledge, reasoning, agentic performance, instruction following, and long-context processing. Compared to its predecessor, Ling-1T, Ling-2.5-1T delivers a holistic upgrade in capabilities, standing as the most powerful instant model in the Ling family to date. Furthermore, when compared to mainstream models—including DeepSeek V3.2, Kimi K2.5, and GPT 5.2—Ling-2.5-1T demonstrates a distinct performance advantage in complex reasoning and instruction-following.

Model Downloads

You can download Ling-2.5-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.

Model	Context Length	Download
Ling-2.5-1T	256K -> 1M (YaRN)	🤗 HuggingFace 🤖 ModelScope

Note: If you are interested in the previous version, please visit the past model collections on Huggingface or ModelScope.

Trillion-scale Hybrid Linear Attention Architecture and Million-Token Context Window

Building upon the Ling 2.0 architecture, Ling 2.5 introduces a Hybrid Linear Attention architecture. Through incremental training, we upgrade the GQA (Grouped Query Attention) of Ling 2.0 architecture to a 1:7 ratio of MLA (Multi-head Linear Attention) + Lightning Linear structure. Specifically, building upon the previously released Ring-flash-linear-2.0 technical roadmap, we transform a subset of GQA layers into Lightning Linear Attention to significantly enhance throughput in long-horizon reasoning scenarios. To further compress the KV Cache, we approximately convert the remaining GQA layers to MLA while applying targeted adaptations for features such as QK Norm (Query-Kernel Normalization) and Partial RoPE (Rotational Positional Encoding), thereby strengthening the expressiveness of Ling 2.5 architecture.

After modification, the trillion-scale version of Ling 2.5 architecture increases activation parameter count from 51B to 63B. However, leveraging the hybrid linear attention architecture, its inference efficiency has still achieved a significant improvement compared to Ling 2.0. Even when benchmarked against the KIMI K2 architecture with only 32B activation parameters, Ling 2.5 maintains notable advantages in throughput for long-horizon task execution; and the longer the generated length, the more pronounced this throughput benefit becomes.

On a single machine with 8 H20-3e GPUs, batch size=64, comparison of decode throughput under different generation lengths.

On a single machine with 8 H200 GPUs, batch size=64, comparison of decode throughput under different generation lengths.

Following the architectural upgrades, we conducted continued pre-training on Ling-2.5-1T-base using 9T high-quality tokens. This phase focused on enhancing the model's world knowledge coverage and fundamental agent capabilities. Simultaneously, leveraging the exceptional computational efficiency and scalability of the Hybrid Linear Attention architecture for long-context processing, we extended the training context window to 256K tokens. Furthermore, via YaRN extrapolation, the model achieves stable support for ultra-long contexts of up to 1M tokens.

Previously, there has been some debate within the community regarding the efficacy of Hybrid Linear Attention for ultra-long context reasoning. To address this, we conducted a systematic evaluation of Ling-2.5-1T on ultra-long context benchmarks. The results indicate that Ling-2.5-1T demonstrates performance advantages across multiple ultra-long context tasks when compared to large instant models utilizing MLA and DSA architectures (such as Kimi K2.5 and DeepSeek V3.2). However, we also acknowledge that a gap remains when compared to leading closed-source API models (such as GPT-5.2 and Gemini 3 Pro). We are committed to further enhancing these capabilities in future iterations.

Ling-2.5-1T demonstrates superior NIAH performance within a 1M-token context window.

Performance Across 16K–1M Token Context Windows on RULER and MRCR

Long-context benchmark comparison (RULER and MRCR scores averaged over 16K–256K token windows)

Quickstart

🚀 Try Online

Coming Soon

🔌 API Usage

Comming Soon

Deployment

SGLang

Environment Preparation

We will later submit our model to SGLang official release, now we can prepare the environment following steps:

git clone -b ling_2_5 git@github.com:antgroup/sglang.git
cd sglang

# Install the python packages
pip install --upgrade pip
pip install -e "python"

Run Inference

Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of the model in ${MODEL_PATH}. Here is the example to run Ling-1T with multiple GPU nodes, where the master node IP is ${MASTER_IP} and server port is ${PORT}:

Start server:

# Node 0:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 0 
# Node 1:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 1 
# Node 2:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 2 
# Node 3:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 3

# This is only an example. Please adjust arguments according to your actual environment.

Client:

curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'

More usage can be found here

Limitations & Future Plans

Ling-2.5-1T achieves high-throughput decoding and leading capabilities in ultra-long context processing. With preliminary agentic interaction capabilities, it lays the groundwork for the era of general-purpose agents.

However, in complex agent interactions and long-horizon tasks, it still lags behind frontier models. The next version will focus on enhancing long-horizon execution and task completion for real-world applications, while continuously improving token efficiency to deliver a superior balance between efficiency and performance.

Hugging Face：https://huggingface.co/inclusionAI/Ling-2.5-1T

ModelScope：https://modelscope.cn/models/inclusionAI/Ling-2.5-1T.

The chat experience page and API services on Ling studio and ZenMux will be launched in the near future.

License

This code repository is licensed under the MIT License.

Downloads last month: 868

Safetensors

Model size

1T params

Tensor type

BF16

F32

Collection including inclusionAI/Ling-2.5-1T

Ling-2.5

Collection

The newest flagship non-reasoning model series. • 1 item • Updated about 19 hours ago • 4