The LLaMA language model, which is actually only available on request, is available as a torrent. The leaked model has its own GitHub repository, and the link can even be found in Meta’s official repository. Actually, the Facebook parent only releases access to LLaMA after registration for selected target groups.
Meta launched LLaMA (Large Language Model Meta AI) at the end of February as competitors to language models such as OpenAI’s GPT-3 and Google’s PaLM (Pathways Language Model). According to the AI team at Meta, which published a paper on LLaMA, it performs significantly better in many benchmarks.
Limited access overturned
However, if you want to have access to the model, you have to officially fill out a form. Meta initially wants to restrict access to certain target groups such as state and civil society research laboratories as well as academic organizations.
Meta justifies the limited access to the Facebook AI blog with the risks that language models entail. OpenAI had listed the same reason for not publicly providing GPT-3 for a long time. When training with texts created by humans, the language models not only adopt knowledge, but also prejudices. Overall, this bias is a problem in machine learning applications. A prominent example is Microsoft’s chatbot Tay, which adopted racial and sexist prejudice in 2016. In its initial phase, GPT-3 also adopted prejudices against Muslims from the training data.
Torrent with links on GitHub
However, the language model has now been leaked: someone has distributed it as a torrent with the associated weights. In addition to information about this on Reddit and Twitter, there is a separate repository on GitHub, which reaches 1100 stars after a few days: llama-dl offers download instructions and information on settings that should deliver improved results.
The link to the torrent can even be found in Meta’s official LLaMA repository. GitHub user ChristopherKing42 submitted it as pull request #73 with the blunt title “Save bandwidth by using a torrent to distribute more efficiently”. In the associated “code” he added the torrent link to the official instruction to fill out the Google form.
However, it is never advisable to use a torrent link to download officially unavailable software. Torrents from unknown sources are generally untrustworthy. Even if the link actually leads to the language model (or at least one) according to information on Twitter, Reddit and GitHub, the download may contain additional malicious code.
(rme)