6 27

Kartik

kartikagg98

AI & ML interests

None yet

Recent Activity

liked a model about 2 months ago

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

liked a dataset 2 months ago

markov-ai/computer-use-large

liked a model 5 months ago

OpenOranje/TweeTaal-nl-en-0.6B

View all activity

Organizations

liked a model about 2 months ago

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Image-Text-to-Text • 28B • Updated Apr 6 • 233k • • 2.84k

liked a dataset 2 months ago

markov-ai/computer-use-large

Updated Mar 16 • 15k • 174

liked a model 5 months ago

OpenOranje/TweeTaal-nl-en-0.6B

Translation • 0.6B • Updated Dec 7, 2025 • 8 • 4

upvoted 2 articles 6 months ago

Article

Smol2Operator: Post-Training GUI Agents for Computer Use

A-Mahla, merve, sergiopaniego, reach-vb, lewtun

•

Sep 23, 2025

• 138

Article

Evaluate Your Own RAG: Why Best Practices Failed Us

charles-azam

•

Nov 5, 2025

• 14

liked a dataset 7 months ago

neulab/agent-data-collection

Preview • Updated Mar 9 • 4.97k • 112

updated a dataset 7 months ago

OpenOranje/squad-en-nl-gemini-translations

Viewer • Updated Oct 24, 2025 • 48 • 12

published a dataset 7 months ago

OpenOranje/squad-en-nl-gemini-translations

Viewer • Updated Oct 24, 2025 • 48 • 12

updated a dataset 7 months ago

OpenOranje/ReOpus-ApolloBooks-EN-NL-1M

Viewer • Updated Oct 24, 2025 • 1.02M • 9

published a dataset 7 months ago

OpenOranje/ReOpus-ApolloBooks-EN-NL-1M

Viewer • Updated Oct 24, 2025 • 1.02M • 9

published a dataset 9 months ago

kartikagg98/codemix_hindi_english_4M

Updated Aug 24, 2025 • 8

updated a dataset 9 months ago

kartikagg98/HINMIX_hi-en

Viewer • Updated Aug 18, 2025 • 25.2M • 230 • 6

upvoted a paper 9 months ago

Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation

Paper • 2403.16771 • Published Mar 25, 2024 • 1

updated a model 11 months ago

kartikagg98/Qwen2-0.5B-GRPO-test

Updated Jun 20, 2025

published a model 11 months ago

kartikagg98/Qwen2-0.5B-GRPO-test

Updated Jun 20, 2025

liked 4 datasets over 1 year ago

reacted to m-ric's post with 🔥 over 1 year ago

Post

2086

🌟🌎 Cohere releases Aya 8B & 32B: SOTA multilingual models for 23 languages !

How did they manage to beat top contenders while also adding 23 languages?

🔄 𝗧𝗿𝗮𝗶𝗻 𝗼𝗻 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗱𝗮𝘁𝗮:
• Synthetic data has been said to cause model-collapse after too much training
• Cohere has introduced "data arbitrage" to prevent this by strategically sampling from a pool of several teacher models instead of one single teacher
• First train a model pool for each different groups of languages, and employ an internal Reward Model named "Arbiter" to evaluate and select the optimal generation. Then only the best generation is kept as the final completion for each prompt
➡️ This process is particularly effective for multilingual setting, where no single teacher model performs in all languages : here "Multilingual Arbitrage" singlehandedly improves win rates of the 8B model vs Gemma-2-9B by 10 points!

🧩 𝗨𝘀𝗲 𝗺𝗼𝗱𝗲𝗹 𝗺𝗲𝗿𝗴𝗶𝗻𝗴: Rather than struggling to find the right mix of data in training a single model for multilingual use, just train language specific models then merge them!
• Maximize diversity between merged checkpoints by training each on different language families.
• Experimented fancy techniques (SLERP, TIES, DARE-TIES) but found out weighted averaging to be the most consistent!
➡️ Merging had 3x more gains at high 35B scale vs the 8B scale - consistent with literature findings that merging is more effective at scale

⚡️ 𝗚𝗿𝗲𝗮𝘁 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: Automatic evaluations on Arena-Hard-Auto dataset:
➡️ Aya Expanse 8B beats models from its weight class such as Gemma 2 9B, Llama 3.1 8B, and the recent Ministral 8B, with win rates ranging from 60.4% to 70.6%
➡️ Aya Expanse 32B outperforms Gemma 2 27B, Mistral 8x22B, and Llama 3.1 70B (2x its size)
• ⚠️ But this performance eval comes from only one benchmark! Let's wait for Open LLM leaderboard evals;

🔒 CC by NC license

Blog post here: https://huggingface.co/blog/aya-expanse