view post Post 4003 OpenAI is now open again! Check out OpenAI’s brand new gpt‑oss‑20b model hosted on ZeroGPU 🤗 merterbak/gpt-oss-20b-demo See translation
view post Post 4291 Qwen 3 technical report released🚀Report: https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf See translation
Papers Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 105 LoRA Learns Less and Forgets Less Paper • 2405.09673 • Published May 15, 2024 • 90 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 50 RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 50
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
Qwen 3 Alibaba's Qwen 3 models Qwen/Qwen3-0.6B Text Generation • 0.8B • Updated Jul 26 • 7.6M • • 879 Qwen/Qwen3-1.7B Text Generation • 2B • Updated Jul 26 • 4.53M • • 354 Qwen/Qwen3-4B Text Generation • 4B • Updated Jul 26 • 3.2M • • 493 Qwen/Qwen3-8B Text Generation • 8B • Updated Jul 26 • 4.19M • • 805
Papers Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 105 LoRA Learns Less and Forgets Less Paper • 2405.09673 • Published May 15, 2024 • 90 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 50 RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 50
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
Qwen 3 Alibaba's Qwen 3 models Qwen/Qwen3-0.6B Text Generation • 0.8B • Updated Jul 26 • 7.6M • • 879 Qwen/Qwen3-1.7B Text Generation • 2B • Updated Jul 26 • 4.53M • • 354 Qwen/Qwen3-4B Text Generation • 4B • Updated Jul 26 • 3.2M • • 493 Qwen/Qwen3-8B Text Generation • 8B • Updated Jul 26 • 4.19M • • 805
Running on Zero 6 Seed Coder 8B Instruct 🚀 ByteDance Seed's coding focused Seed-Coder-8B-Instruct model