apple isn’t one of the top players in the AI game today, but the company’s new open-source AI model for image editing shows what it could contribute to the space. The model is called MLLM-Guided Image Editing (MGIE), which uses multimodal large language models (MLLM) to interpret text-based commands when manipulating images. In other words, the tool has the ability to edit photos based on the text entered by the user. While it’s not the first tool to do this, “human instructions are sometimes too short to draw and follow current methods.” project document (PDF) reads.
The company developed MGIE in collaboration with researchers at the University of California, Santa Barbara. MLLMs have the power to turn simple or vague text instructions into more detailed and clear instructions that the photo editor can follow. For example, if a user wants to edit a photo of a pepperoni pizza to “make it healthier,” MLLMs can interpret that as “add veggies” and edit the photo that way.
MGIE can crop, resize, and rotate photos, as well as improve brightness, contrast, and color balance, all through text prompts, in addition to making basic changes to images. It can also edit specific areas of the photo and, for example, change the subject’s hair, eyes and clothing, or remove elements in the background.
whom VentureBeat The notes were released through the Apple model GitHubbut those who are interested can also try demo currently posted on Hugging Face Spaces. Apple has yet to say whether it plans to use the learnings from this project in a tool or feature that it could incorporate into any product.