Introduction
Many new firms are popping up and releasing new open supply Massive Language Fashions within the coming years. As time progresses, these fashions have gotten nearer and nearer to the paid closed-source fashions. These firms are releasing these fashions in numerous sizes and ensuring to maintain their licenses in order that anybody can use them commercially. One such group of fashions is Qwen. Its earlier fashions have confirmed to be the most effective open supply fashions alongside Mistral and Zephyr and now they’ve not too long ago introduced a model 2 of it known as the Qwen2.
Studying Targets
- Find out about Qwen, Alibaba Cloud’s open-source language fashions.
- Uncover Qwen2’s new options.
- Assessment Qwen2’s efficiency benchmarks.
- Making an attempt Qwen2 with the HuggingFace Transformer library.
- Acknowledge Qwen2’s industrial and open-source potential.
This text was printed as part of the Knowledge Science Blogathon.
What’s Qwen?
Qwen refers to a household of Massive Language Fashions backed by Alibaba Cloud, a agency positioned in China. It has made an amazing contribution to AI Area by releasing lots of its open-source fashions which can be on par with the highest fashions on the HuggingFace leaderboard. Qwen has launched its fashions in several sizes starting from the 7 Billion Parameter mannequin to the 70 Billion Parameter mannequin. They haven’t simply launched the fashions however have finetuned them in a manner that was on the high of the leaderboard after they had been launched.
However Qwen didn’t cease it with this. It has even launched Chat Finetuned fashions, LLMs that had been closely skilled in Arithmetic and Code. It has even launched imaginative and prescient language fashions. The Qwen group is even shifting to the audio house to launch Textual content-to-speech fashions. Qwen is attempting to create an ecosystem of open-source fashions available for everybody to start out constructing purposes with them with none restrictions and for industrial functions.
What’s Qwen2?
Qwen obtained a lot appreciation from the open-source group when it was launched. Numerous derivates have been created from this Qwen mannequin. Lately the Qwen group has introduced a collection of successor fashions to its earlier technology, known as the Qwen2 with extra fashions and extra finetuned variations in comparison with earlier generations.
Qwen2 was launched in 5 totally different sizes, which embody the 0.5B, 1.5B, 7B, 14B, and 72 Billion variations. These fashions have been pretrained on greater than 27 totally different languages and have been considerably improved within the areas of code and arithmetic in comparison with the sooner technology of fashions. The nice factor is right here is that even the 0.5B and the 1.5B fashions include 32k context size. Whereas the 7B and the 72B include 128k context size.
All these fashions have Grouped Question Consideration, which tremendously hurries up the method of consideration and the quantity of reminiscence required to retailer the intermediate outcomes throughout the inference.
Efficiency and Benchmarks
Coming to the bottom mannequin comparisons, the Qwen2 72B Massive Language Mannequin outperforms the newly launched Llama3 70B mannequin and the combination of exports Mixtral 8x22B mannequin. We will see the benchmark scores within the beneath pic. The Qwen mannequin outperforms each the Llama3 and Mixtral in lots of benchmarks like MMLU, MMLU-Professional, TheoremQA, HumanEval, GSM8k and plenty of extra.
Coming to the smaller mannequin i.e. the Qwen2 7B Instruct Mannequin, it additionally outperforms the newly launched SOTA(State-Of-The-Artwork) fashions just like the Llama3 8B Mannequin and the GLM4 9B Mannequin. Regardless of Qwen2 being the smallest mannequin of the three, it outperforms each of them and the outcomes for all of the benchmarks might be seen within the beneath pic.
Qwen2 in Motion
We might be working with Google Colab to check out the Qwen2 mannequin.
Step1: Obtain Libraries
To get began, we have to obtain a couple of helper libraries. For this, we work with the beneath code:
!pip set up -U -q transformers speed up
- transformers: It’s a standard Python package deal from HuggingFace, with which we are able to obtain any deep studying fashions and work with them.
- speed up: Even this, is a package deal developed by HuggingFace. This package deal helps in rising the inference pace of the Massive Language Fashions when they’re operating on the GPU.
Step2: Obtain the Qwen Mannequin
Now we are going to write the code to obtain the Qwen mannequin and take a look at it. The code for this might be:
from transformers import pipeline
machine = "cuda"
pipe = pipeline("text-generation",
mannequin="Qwen/Qwen2-1.5B-Instruct",
machine=machine,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.95,
)
- We begin by importing the pipeline operate from the transformers library.
- Then we set the machine to which the mannequin needs to be mapped to. Right here, we set it to cuda, which implies the mannequin might be despatched to GPU if obtainable.
- mannequin=”Qwen/Qwen2-1.5B-Instruct”: This tells the pre-trained mannequin to be labored with machine=machine: This tells the machine for use for operating the mannequin.
- max_new_tokens=512: Right here, we give the utmost variety of new tokens to be generated.
- do_sample=True: This allows sampling throughout technology for elevated range within the output.
- temperature=0.7: This controls the randomness of the generated textual content. Greater values result in extra inventive and unpredictable outputs.
- top_p=0.95: This units the likelihood mass to be thought of for the subsequent token throughout technology.
Step3: Giving Record of Messages to the Mannequin
Now, allow us to attempt giving the mannequin a listing of messages for the enter and see the output that it generates for the given listing of messages.
messages = [
"role": "system",
"content": "You are a funny assistant. You must respons to user questions in funny way",
"role": "user", "content": "What is life?",
]
response = pipe(messages)
print(response[0]['generated_text'][-1]['content'])
- Right here, the primary message is a system message that instructs the assistant to be humorous.
- The second message is a person message that asks “What’s life?”.
- We put each these messages as gadgets in a listing.
- Then we give this listing, containing a listing of messages to the pipeline object, that’s to our mannequin.
- The mannequin then processes these messages and generates a response.
- Lastly, we extract the content material of the final generated textual content from the response.
Operating this code has produced the next output:
We see that the mannequin certainly tried to generate a humorous reply.
Step4: Testing the Mannequin with Arithmetic Questions
Now allow us to take a look at the mannequin with a couple of arithmetic questions. The code for this might be:
messages = [
"role": "user", "content": "If a car travels at a constant speed of
60 miles per hour, how far will it travel in 45 minutes?",
"role": "assistant", "content": "To find the distance,
use the formula: distance = speed × time. Here, speed = 60 miles per
hour and time = 45 minutes = 45/60 hours. So, distance = 60 × (45/60) = 45 miles.",
"role": "user", "content": "How far will it travel in 2.5 hours? Explain step by step"
]
response = pipe(messages)
print(response[0]['generated_text'][-1]['content'])
- Right here once more, we’re creating a listing of messages.
- The primary message is a person message that asks how far a automotive will journey in 45 minutes at a relentless pace of 60 miles per hour.
- The second message is an assistant message that gives the answer to the person’s query utilizing the method distance = pace × time.
- The third message is once more a person message asking the assistant one other query.
- Then we give this listing of messages to the pipeline.
- The mannequin will then course of these messages and generate a response.
The output generated by operating the code might be seen beneath:
We will see that the Qwen2 1.5B mannequin began pondering step-by-step to reply the person query. It first began by defining the method to calculate distance. Following that, it wrote down the knowledge it has concerning the pace and time. Then it has lastly put collectively this stuff to make up the ultimate reply. Regardless of simply being a 1.5 Billion Parameter mannequin, the mannequin is really working effectively.
Testing with Extra Examples
Allow us to take a look at the mannequin with a couple of extra examples:
messages = [
"role": "user", "content": "A clock shows 12:00 p.m. now.
How many degrees will the minute hand move in 15 minutes?",
"role": "assistant", "content": "The minute hand moves 360 degrees
in one hour (60 minutes). Therefore, in 15 minutes, it will
move (15/60) * 360 degrees = 90 degrees.",
"role": "user", "content": "How many degrees does the hour hand
move in 15 minutes?"
]
response = pipe(messages)
print(response[0]['generated_text'][-1]['content'])
messages = [
"role": "user", "content": "Convert 100 degrees Fahrenheit to Celsius.",
"role": "assistant", "content": "To convert Fahrenheit to Celsius,
use the formula: C = (F - 32) × 5/9. So, for 100 degrees Fahrenheit,
C = (100 - 32) × 5/9 = 37.78 degrees Celsius.",
"role": "user", "content": "What is 0 degrees Celsius in Fahrenheit?"
]
response = pipe(messages)
print(response[0]['generated_text'][-1]['content'])
messages = [
"role": "user", "content": "What gets wetter as it dries?",
"role": "assistant", "content": "A towel gets wetter as it dries
because it absorbs the water from the body, becoming wetter itself.",
"role": "user", "content": "What has keys but can't open locks?"
]
response = pipe(messages)
print(response[0]['generated_text'][-1]['content'])
Right here we’ve got moreover examined the mannequin with three different examples. The primary two examples are on arithmetic once more. We see that Qwen2 1.5B was capable of perceive the query effectively and was capable of generate a delightful reply.
However within the instance, it has failed. The reply to the query is the piano keys. That may be a piano that has keys however can’t open locks. The mannequin has did not reply this however got here up with a distinct reply. It answered the keychain and even gave a supporting assertion to it. We can not precisely say it has failed as a result of technically a keychain incorporates open locks however the keys within the keychain do.
Total, we see that regardless of being a 1.5 Billion parameter mannequin, the Qwen2 1.5B has answered the mathematical questions appropriately and was capable of present good reasoning across the solutions it generated. This tells us the larger parameter fashions just like the Qwen2 7B, 14B, and 72B fashions can carry out extraordinarily effectively in several duties.
Conclusion
Qwen2, a brand new collection of open-source fashions from Alibaba Cloud, represents an amazing development within the area of huge language fashions (LLMs). Constructing on the success of its predecessor, Qwen2 presents a spread of fashions from 0.5B to 72B parameters, excelling in efficiency throughout numerous benchmarks. The fashions are designed to be versatile and commercially accessible, supporting a number of languages and that includes improved capabilities in code, arithmetic, and extra. Qwen2’s spectacular efficiency and open accessibility place it as a formidable competitor to closed-source options, fostering innovation and utility improvement in AI.
Key Takeaways
- Qwen2 continues the pattern of high-quality open-source LLMs, offering strong options to closed-source fashions.
- The Qwen2 collection contains fashions from 0.5 billion to 72 billion parameters, catering to various computational wants and use instances.
- Qwen2 fashions are pretrained in over 27 languages, enhancing their applicability in international contexts
- Licenses that permit for industrial use promote widespread adoption and innovation of Qwen2 fashions.
- Builders and researchers can simply combine and make the most of the fashions through standard instruments like HuggingFace’s transformers library, making them accessible.
Steadily Requested Questions
A. Qwen is a household of huge language fashions created by Alibaba Cloud. They launch open-source fashions in numerous sizes which can be aggressive with paid fashions.
A. Qwen2 is the most recent model of Qwen fashions with improved efficiency and extra options. It is available in totally different sizes, starting from 0.5 billion to 72 billion parameters.
A. You should utilize the pipeline operate from the Transformers library to generate textual content with Qwen2. The code instance within the article reveals how to do that.
A. Qwen2 outperforms different main fashions in lots of benchmarks, together with language understanding, code technology, and mathematical reasoning.
A. Sure, Qwen2, particularly the bigger fashions, can reply math questions and supply explanations for the solutions.
The media proven on this article just isn’t owned by Analytics Vidhya and is used on the Creator’s discretion.