Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    AI updates from the previous week: Anthropic launches Claude 4 fashions, OpenAI provides new instruments to Responses API, and extra — Might 23, 2025

    May 23, 2025

    Crypto Sniper Bot Improvement: Buying and selling Bot Information

    May 23, 2025

    Upcoming Kotlin language options teased at KotlinConf 2025

    May 22, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    TC Technology NewsTC Technology News
    • Home
    • Big Data
    • Drone
    • Software Development
    • Software Engineering
    • Technology
    TC Technology NewsTC Technology News
    Home»Big Data»Microsoft drops Florence-2, a unified mannequin to deal with a wide range of imaginative and prescient duties
    Big Data

    Microsoft drops Florence-2, a unified mannequin to deal with a wide range of imaginative and prescient duties

    adminBy adminJune 19, 2024Updated:June 19, 2024No Comments5 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Microsoft drops Florence-2, a unified mannequin to deal with a wide range of imaginative and prescient duties
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Microsoft drops Florence-2, a unified mannequin to deal with a wide range of imaginative and prescient duties

    It is time to have fun the unimaginable girls main the best way in AI! Nominate your inspiring leaders for VentureBeat’s Girls in AI Awards immediately earlier than June 18. Study Extra


    At the moment, Microsoft’s Azure AI crew dropped a brand new imaginative and prescient basis mannequin referred to as Florence-2 on Hugging Face.

    Out there beneath a permissive MIT license, the mannequin can deal with a wide range of imaginative and prescient and vision-language duties utilizing a unified, prompt-based illustration. It is available in two sizes — 232M and 771M parameters — and already excels at duties similar to captioning, object detection, visible grounding and segmentation, acting on par or higher than many giant imaginative and prescient fashions on the market.

    Whereas the real-world efficiency of the mannequin is but to be examined, the work is anticipated to provide enterprises a single, unified method to deal with various kinds of imaginative and prescient functions. This can save investments on separate task-specific imaginative and prescient fashions that fail to past their main perform, with out in depth fine-tuning.

    What makes Florence-2 distinctive?

    At the moment, giant language fashions (LLMs) sit on the coronary heart of enterprise operations. A single mannequin can present summaries, write advertising copies and even deal with customer support in lots of circumstances. The extent of adaptability throughout domains and duties has been wonderful. However, this success has additionally left researchers questioning: Can imaginative and prescient fashions, which have been largely task-specific, do the identical? 


    VB Remodel 2024 Registration is Open

    Be a part of enterprise leaders in San Francisco from July 9 to 11 for our flagship AI occasion. Join with friends, discover the alternatives and challenges of Generative AI, and learn to combine AI functions into your business. Register Now


    On the core, imaginative and prescient duties are extra advanced than text-based pure language processing (NLP). They demand complete perceptual capacity. Primarily, to attain common illustration of various imaginative and prescient duties, a mannequin should be able to understanding spatial knowledge throughout completely different scales, from broad image-level ideas like object location, to fine-grained pixel particulars, in addition to semantic particulars similar to high-level captions to detailed descriptions.

    When Microsoft tried fixing this, it discovered two key roadblocks: Shortage of comprehensively annotated visible datasets and the absence of a unified pretraining framework with a singular community structure that built-in the power to grasp spatial hierarchy and semantic granularity.

    To deal with this, the corporate first used specialised fashions to generate a visible dataset referred to as FLD-5B. It included a complete of 5.4 billion annotations for 126 million photographs, overlaying particulars from high-level descriptions to particular areas and objects. Then, utilizing this knowledge, it skilled Florence-2, which makes use of a sequence-to-sequence structure (a sort of neural community designed for duties involving sequential knowledge) integrating a picture encoder and a multi-modality encoder-decoder. This permits the mannequin to deal with varied imaginative and prescient duties, with out requiring task-specific architectural modifications​​.

    “All annotations within the dataset, FLD-5B, are uniformly standardized into textual outputs, facilitating a unified multi-task studying method with constant optimization with the identical loss perform as the target,” the researchers wrote within the paper detailing the mannequin. “The end result is a flexible imaginative and prescient basis mannequin able to performing a wide range of duties… all inside a single mannequin ruled by a uniform set of parameters. Process activation is achieved by way of textual prompts, reflecting the method utilized by giant language fashions.”

    Efficiency higher than bigger fashions

    When prompted with photographs and textual content inputs, Florence-2 handles a wide range of duties, together with object detection, captioning, visible grounding and visible query answering. Extra importantly, it delivers this with high quality on par or higher than many bigger fashions. 

    As an illustration, in a zero-shot captioning check on the COCO dataset, each 232M and 771M variations of Florence outperformed Deepmind’s 80B parameter Flamingo visible language mannequin with scores of 133 and 135.6, respectively. They even did higher than Microsoft’s personal visible grounding-specific Kosmos-2 mannequin.

    When fine-tuned with public human-annotated knowledge, Florence-2, regardless of its compact measurement, was in a position to compete carefully with a number of bigger specialist fashions throughout duties like visible query answering.

    “The pre-trained Florence-2 spine enhances efficiency on downstream duties, e.g. COCO object detection and occasion segmentation, and ADE20K semantic segmentation, surpassing each supervised and self-supervised fashions,” the researchers famous. “In comparison with pre-trained fashions on ImageNet, ours improves coaching effectivity by 4X and achieves substantial enhancements of 6.9, 5.5, and 5.9 factors on COCO and ADE20K datasets.”

    As of now, each pre-trained and fine-tuned variations of Florence-2 232M and 771M can be found on Hugging Face beneath a permissive MIT license that permits for unrestricted distribution and modification for business use or non-public use. 

    It will likely be attention-grabbing to see how builders will put it to make use of and offload the necessity for separate imaginative and prescient fashions for various duties. Small, task-agnostic fashions cannot solely save builders the necessity to work with completely different fashions but in addition lower down the compute prices by a major margin.

    VB Every day

    Keep within the know! Get the most recent information in your inbox day by day

    By subscribing, you comply with VentureBeat’s Phrases of Service.

    Thanks for subscribing. Take a look at extra VB newsletters right here.

    An error occured.



    Supply hyperlink
    Post Views: 73
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Do not Miss this Anthropic’s Immediate Engineering Course in 2024

    August 23, 2024

    Healthcare Know-how Traits in 2024

    August 23, 2024

    Lure your foes with Valorant’s subsequent defensive agent: Vyse

    August 23, 2024

    Sony Group and Startale unveil Soneium blockchain to speed up Web3 innovation

    August 23, 2024
    Add A Comment

    Leave A Reply Cancel Reply

    Editors Picks

    AI updates from the previous week: Anthropic launches Claude 4 fashions, OpenAI provides new instruments to Responses API, and extra — Might 23, 2025

    May 23, 2025

    Crypto Sniper Bot Improvement: Buying and selling Bot Information

    May 23, 2025

    Upcoming Kotlin language options teased at KotlinConf 2025

    May 22, 2025

    Mojo and Constructing a CUDA Substitute with Chris Lattner

    May 22, 2025
    Load More
    TC Technology News
    Facebook X (Twitter) Instagram Pinterest Vimeo YouTube
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    © 2025ALL RIGHTS RESERVED Tebcoconsulting.

    Type above and press Enter to search. Press Esc to cancel.