The AI based tool can create “cutouts” or segments of different parts of an image. This comes handy while editing photos or while analyzing imagery for biological or security purposes. META AI
The announcement comes as the social media giant increasingly diverts its attention from creating a virtual reality-based Metaverse to embed AI features across its platforms like Instagram, Facebook, Messenger and WhatsApp
Editing photos, analyzing surveillance footage and understanding the parts of a cell. These tasks have one thing in common: you need to be able to identify and separate different objects within an image. Traditionally, researchers have had to start from scratch each time they want to analyze a new part of an image.
Meta aims to change this laborious process by being the one-stop-shop for researchers and web developers working on such problems. On Wednesday, the company released an AI model, called “Segment Anything Model” or “SAM” through which users can create “cutouts” or segments of any item in an image by clicking on a point or drawing a box around the object. The tool can be used in research purposes, for creative editing or even to make sense of objects while wearing a VR headset by making it faster and more efficient to carve up different parts of an image.
The tech company launched the browser-based tool to the public and also open sourced its computer vision model, which it claims is trained on “the largest segmentation dataset” of 1.1 billion segmentation masks (“masks” are different parts of an image) and 11 million images licensed from a large photo company. Meta did not disclose which company it licensed the images from. Meta AI, the artificial intelligence research arm of the social media giant, worked with 130 human annotators based in Kenya to create the dataset, which was made through a combination of manual and automatic labeling of a billion parts of millions of images.
Object recognition and computer vision technologies have been around for years and are already integrated in various devices such as surveillance cameras and drones. Amazon stores for example use object recognition to detect the items you put into your basket and autonomous vehicles use it to perceive their surroundings. Contemporary startups like Runway and incumbents like Adobe have commercialized their ability to use AI to detect and select different objects within an image for their creative users. As snazzy generative AI chatbots have emerged, the goal for AI researchers at Meta was to merge the advancement in AI foundational models with the dormant realm of computer vision technologies.
“I wouldn’t say that this is a new area of technology. Object segmentation already exists so I wouldn’t say this is a new capability. Fundamentally, I think their approach of using foundational models is new and the size of the dataset they’re training on could be novel,” says Paul Powers, CEO and founder of Physna, a search engine for 3D objects.
But what Meta hopes is that by releasing these tools more broadly, it’ll encourage users to build on top of their generalized model for more specific use cases in fields like biology and agriculture.
The announcement comes simultaneously as Meta reportedly plans to use generative AI for advertisements across Instagram and Facebook. Not wanting to miss out on the buzz around AI, in late February, CEO Mark Zuckerberg announced that he is creating a new product team solely focused on building generative AI tools like artificial personas, Instagram filters and chat-based features in WhatsApp and Instagram. Zuckerberg reportedly spends most of his time with the new AI team.
The SAM tool is built for those who don’t have the AI infrastructure or the data capacity to create their own models to “segment” or identify different components of an image, say Meta AI researchers Alexander Kirillov and Nikhila Ravi. “This is happening in real time in the browser and that makes this model much more accessible to so many more people because they don’t need to be able to run a lot of stuff on GPU…We can enable a lot more edge use cases that some other methods might not allow,” Ravi says.
But there are limitations of a computer vision model trained on a database of two-dimensional images, says Powers. For example, for the tool to detect and select a remote held upside down, it would need to be trained on different orientations of the same object. Models trained on 2D images won’t help detect images that are partly covered or partially exposed, he says. This means it wouldn’t accurately identify non-standardized objects through an AR/VR headset or it wouldn’t detect partially covered objects in public spaces if used by an autonomous vehicle manufacturer.
For the company, which rebranded itself from Facebook to Meta in late 2021 and inked its commitment to the Metaverse, the most obvious use for this object detection tool is in its virtual reality spaces such as its online VR game Horizon Worlds. Kirillov and Ravi say that their object detection tool can be used for “gaze-based” detection of objects through virtual reality and augmented reality headsets.
The model can detect unknown objects and work across domains with underwater, microscopic, aerial and agricultural images. Kirillov says he was inspired to create a generalized image segmenting model while talking to PhD researchers. “I was giving a presentation about segmentation to some natural scientists in Berkeley and people were like ‘Okay sure, this is all cool, but I need to like count and identify trees in the photos I’ve collected for my research about fires in California,’ and so this model can do that for them,” Kirillov tells Forbes.