Sunny Sanyal
Sunny111
AI & ML interests
Efficient Training Recipes of Large Models (mostly LLMs)
Recent Activity
posted
an
update
about 2 hours ago
Are you familiar with reverse residual connections or looping in language models?
Excited to share my Looped-GPT blog post and codebase ๐
https://github.com/sanyalsunny111/Looped-GPT
TL;DR: looping during pre-training improves generalization.
Plot shows GPT2 LMs pre-trained with 15.73B OWT tokens
P.S. This is my first post here โ I have ~4 followers and zero expectations for reach ๐
upvoted
a
paper
29 days ago
Pre-training Small Base LMs with Fewer Tokens
liked
a model
about 1 month ago
GuminiResearch/Gumini-1.5B-Base