Passionategeekz On June 27, Alibaba Cloud Tongyi Qianwen just posted a report to announce the launch of the latest Qwen VLo – a multimodal unified understanding and generation model, which users can use Qwen Chat (chat.qwen.ai) Experience it.
This newly upgraded model can not only “understand” the world, but also conduct high-quality recreation based on understanding, truly achieving the leap from perception to generation.
According to reports, Qwen VLo can gradually and clearly construct the entire picture from left to right, from top to bottom in a progressive generation method.
During the generation process, the model will continuously adjust and optimize the predicted content to ensure that the final results are more harmonious and consistent. This generation mechanism not only improves the visual effect, but also brings users a more flexible and controllable creative experience.
Officially stated that Qwen VLo adopts dynamic resolution training and supports dynamic resolution generation. Whether it is the input or output, the model supports image generation with arbitrary resolution and aspect ratio.
This means that users are no longer limited to a fixed format, and can generate image content that adapts to different scenes according to actual needs, whether it is posters, illustrations, web banners or social media covers, they can easily deal with it.
In addition, Qwen VLo has also innovatively introduced a new generation mechanism: a gradually clear generation process from top to bottom and from left to right. This mechanism not only improves generation efficiency, but is also particularly suitable for long paragraph text generation tasks that require fine control. For example, when generating an ad design or comic storyboard with a lot of text, Qwen VLo is gradually generated and modified slowly. This progressive generation method allows users to observe the generation process in real time and adjust it as needed to obtain the best creative effect.
Alibaba Cloud official reminder that Qwen VLo is still in the preview stage and there are still many shortcomings. There may be problems in the generation process that are not consistent with the facts and are not completely consistent with the original image, and the development team is still iterating.
Qwen VLo has comprehensively upgraded its original multimodal understanding and generation capabilities, significantly enhancing the depth of understanding of image content, and on this basis, it has achieved more accurate and consistent generation effects.
Here are the core highlights of Qwen VLo:
01 More accurate content understanding and recreation
Previous multimodal models were prone to semantic inconsistencies during the generation process, such as mistakenly generating other types of objects in the car, or the key structural features of the original image were not preserved. Qwen VLo can maintain a high degree of semantic consistency during the generation process through stronger detail capture capabilities. For example, when a user enters a photo of a car and asks to “change colors”, Qwen VLo can not only accurately identify the model, but also retain its original structural characteristics, while completing the natural conversion of color style, so that the generated results are both in line with expectations and without losing the sense of reality.
02 Support open command editing, modification and generation
Users can propose various creative instructions through natural language, such as “changing this style to Van Gogh’s style”, “making this photo look like an old photo from the 19th century” or “adding a clear sky to this picture.” Qwen VLo has the flexibility to respond to these open instructions and generate results that meet users’ expectations. Whether it is artistic style transfer, scene reconstruction or detail modification, the model can easily cope with it. Even some traditional visual perception characters such as predicted depth maps, segmented maps, detection maps, and edge information can be easily completed through editing instructions. Going further, like many more complex instructions, such as a command that also contains modifying objects, modifying text, and changing background, the model can be easily completed.
03 Multilingual command support
Qwen VLo supports multiple language instructions including Chinese and English, breaking language barriers and providing global users with a unified and convenient interactive experience. No matter which language you are in, simply describe your needs and the model can quickly understand and output ideal results.
Advertising statement: The external redirect links (including, not limited to, hyperlinks, QR codes, passwords, etc.) contained in the article are used to convey more information and save selection time. The results are for reference only. All articles from Passionategeekz include this statement.
Discover more from PassionateGeekz
Subscribe to get the latest posts sent to your email.