Post
Wonderful open source Italian dataset from @manalog and @ruggsea :
https://huggingface.co/datasets/manalog/UsenetArchiveIT
The dataset contributes to the
mii-community project, aimed at advancing the creation of Italian open-source Language Models (LLMs).๐ฎ๐น ๐ค About 10-20 billion token, probably the best conversational open source dataset in the Italian language. ๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น๐ฎ๐น
https://huggingface.co/datasets/manalog/UsenetArchiveIT
The dataset contributes to the