@giux78 on Hugging Face: "Wonderful open source Italian dataset from @manalog and @ruggsea:…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update Mar 13, 2024

Post

Wonderful open source Italian dataset from @manalog and @ruggsea :

https://huggingface.co/datasets/manalog/UsenetArchiveIT

The dataset contributes to the

mii-community project, aimed at advancing the creation of Italian open-source Language Models (LLMs).🇮🇹 🤖 About 10-20 billion token, probably the best conversational open source dataset in the Italian language. 🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹🇮🇹

ruggsea

Mar 15, 2024

Afaik, the dataset could be the biggest Italian language dataset on Hugginface and probably one of the biggest Italian text datasets ever (excluding Common Crawl based datasets)

ruggsea

Mar 15, 2024

Afaik, the dataset could be the biggest Italian language dataset on Hugginface and probably one of the biggest Italian text datasets ever (excluding Common Crawl based datasets)

In this post

giux78 Alessandro Ercolani
ruggsea Ruggero Marino Lazzaroni