Cahlen Humphreys PRO

cahlen

https://bigcompute.science

cahlen

AI & ML interests

☠️💻

Recent Activity

liked a dataset 3 days ago

ScaleAI/SWE-bench_Pro

liked a dataset 3 days ago

NuTonic/sat-image-boundingbox-sft-full

new activity 3 days ago

RedHatAI/Qwen3.6-35B-A3B-NVFP4:Great quant!!

View all activity

Organizations

liked 2 datasets 3 days ago

ScaleAI/SWE-bench_Pro

Benchmark • Updated Feb 23 • 731 • 59.3k • 100

NuTonic/sat-image-boundingbox-sft-full

Viewer • Updated 5 days ago • 531k • 1.65k • 11

New activity in RedHatAI/Qwen3.6-35B-A3B-NVFP4 3 days ago

Great quant!!

#6 opened 7 days ago by

tasticleeze

liked a model 3 days ago

RedHatAI/Qwen3.6-35B-A3B-NVFP4

Updated 8 days ago • 563k • 106

reacted to anakin87's post with ❤️🔥 4 days ago

Post

3232

A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

🧑‍🍳 Here's how:

1️⃣ Build a solid RL env with Verifiers (Prime Intellect)
2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3️⃣ SFT warm-up to teach format
4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves
5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini 🏆

---

🎮 Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

🤗 Model: anakin87/LFM2-2.6B-mr-tictactoe

📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe