Home » Phi3, how to try (and why) the smallest Microsoft artificial intelligence

Phi3, how to try (and why) the smallest Microsoft artificial intelligence

by admin
Phi3, how to try (and why) the smallest Microsoft artificial intelligence

Listen to the audio version of the article

A mini language model that does well, despite its limitations: interesting to try and useful for sitting on the frontier of what are called small language models. And which we will increasingly find integrated into smartphones, Internet of Things objects, home automation, ATMs, etc.
These are the sensations we can get from a small test of Phi3, the smallest artificial intelligence model designed by Microsoft.
A test done on Lm Studio, software that allows everyone to try open source models, with a simple interface.

The peculiarities of Phi3

Microsoft is now making available to the public the first of this family of more powerful and smaller language models: Phi-3-mini, which measures 3.8 billion parameters and would perform better than models twice the size, according to tests done by the company, for linguistic interpretation, coding, mathematics. Microsoft has also announced the imminent arrival of other models in the Phi-3 family to offer greater choice in terms of quality and cost. Phi-3-small (7 billion parameters) and Phi-3-medium (14 billion parameters). The underlying idea is to train models using only words understandable to a 4-year-old child, but choosing only high-quality data to optimize training.

Small language models are designed to perform simple tasks well, and are more accessible and easy to use for organizations with limited resources. The advantages are the possibility of using them on even low-powered devices, without internet (or with a poor network) and with total privacy of the data entered (everything remains on your device). When used on powerful computers, these models are even faster than large ones like OpenAI’s Gpt. They can give immediate answers. The tests carried out by Microsoft and various researchers demonstrate the limits of these models for cultural and scientific questions; when they are asked to make complex reasoning, innovative applications (in medicine for example); or when they would need to analyze a large amount of information. All cases in which the classic large models are more suitable. Unfortunately, another known limitation of the Phi3, which we found in our test, is in multilingual: it copes much better in English. These limitations arise from the limited amount of information used for training. The model knows fewer things in factual terms and, due to the limits in the ability of linguistic interpretation, tends more to hallucinate. These defects can be partly compensated by combining the model with a web search or by providing data in the input, to be processed (the so-called few shots), such as a text to be summarized or analysed. Among the most common uses of a model such as Phi-3: summarize a long document; extract relevant information and industry trends from research or market reports. Generate texts for marketing or sales, make posts on social media or product descriptions for e-commerce. Or it can become a chatbot for customer support, if the company feeds it the most frequently asked questions and the answers to give.

See also  Era 100 and Era 300 are the new smart speakers from Sonos

The Phi3 test

For our test we used Lm Studio. After installation we write Phi3 in the Home bar or in the search (magnifying glass in the left menu). Here you will see a choice of files to choose from. There is the Q4 version, smaller, and the F16 version. The first is quantized to 4 bits, i.e. compressed. On a computer it would be better to use F16, with 16 flops (a value that indicates the number of floating point operations performed in one second). It is larger and requires more GPU power but all in all within the reach of a computer that is not too slow. Then we click on the cloud on the left (the chat) and in the drop-down menu at the top we select the model to load. Then we start asking questions and requests, as we would with the free Chatgpt (version 3.5). We asked both versions of Phi3 how to make pasta carbonara (factual knowledge test); the q4 version dared to suggest a vegetarian variant (without us asking) with cooked ham. However, F16 is good. A classic linguistic interpretation test – a brick weighs one kilo plus half a brick, how much does a brick weigh? – he often makes mistakes, in various tests, in Q4 and less often in F16. But only Gpt4 (not 3.5) always solves it in our tests. As often happens with chatbots, the output improves if we ask them to think step by step, i.e. if we force them to break down the problem (the right answer is 2 kilos). Then a test of understanding of social common sense, for which Phi3 should excel (according to what Microsoft declared): Lucia says to Franco: “I have to tell you a secret”; Franco approaches Lucia, why did she do it? The right answer, i.e. the most reasonable one given the premises – and both Gpt 3.5 and Gpt 4 offer it – is “to hear the secret while keeping the conversation private and confidential”. Phi3 goes around it, giving various explanations, even imaginative ones (hallucinations) such as “Franco is perhaps looking for a romantic approach”. Nothing to say however for a test of synthesis of a text.

In the end, trying Phi3 is recommended to many: to those who want to evaluate the applicability of this model for future adoption in the corporate environment or to generally understand the potential of current small models in the context of continuous evolution of generative artificial intelligence.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy