Google’s teaser at I/O 2024 gave us an idea of where AI assistants are going in the future. It’s a multi-modal feature that combines the smart features of Gemini with the image recognition capabilities you get in Google Lens, as well as powerful natural language responses. However, while the promotional video is neat, after testing it in person, it’s clear that something like the Astra has a long way to go before it lands on your phone. So here are three takeaways from our first experience with Google’s next-generation AI.
Reception of Sam:
Currently, most people interact with digital assistants using their voice, so Astra’s multimodality (ie using vision and voice in addition to text/speech) to communicate with AI is relatively new. In theory, this would allow computer-based entities to function and behave more like real assistants or agents—which was one of Google’s buzzwords for the show—rather than something more robotic that simply responds to spoken commands.
In our demo, we had the option to ask Astra to tell a story based on some of the objects we put in front of the camera, after which she told us a wonderful tale about a dinosaur and his trusty baguette trying to avoid a terrifying red light. It was fun and the story was cute and the AI worked as you would expect. But at the same time, it was far from the omniscient assistant we saw in Google’s teaser. Other than entertaining a child with an original bedtime story, Astra didn’t feel as packed with information as you’d like.
Then my colleague Karissa drew a bucolic scene on the touchscreen, and Astra correctly identified the flower and the sun she was drawing. But the most compelling demo was when we went back a second time with Astra running on the Pixel 8 Pro. This allowed them to point their cameras at a set of objects while each tracked and memorized their location. It was even smart enough to recognize where I kept my clothes and sunglasses, although those objects weren’t part of the initial demo.
In some ways, our experience highlighted the potential highs and lows of AI. Just the ability for a digital assistant to tell you where you might have left your keys or how many apples are in your fruit bowl before you head to the grocery store can help you save real time. But after speaking to some of the researchers behind Astra, there are still many hurdles to overcome.
Unlike many of Google’s new AI features, Astra (described by Google as a “research preview”) still requires help from the cloud rather than working on the device. While it supports some level of object persistence, these “memories” currently only last for a single session lasting a few minutes. Even if Astra can remember things for longer, there are things like retention and latency to consider, as you risk slowing down the AI for every object that Astra remembers, resulting in a higher experience. So while it’s clear that the Astra has a lot of potential, my excitement is tempered by the knowledge that it will be a while before we get more full-featured functionality.
Karissa’s speech:
Of all the advances in generative AI, multimodal AI intrigued me the most. As powerful as the latest models are, I find it hard to get excited about repeated updates to text-based chatbots. But the idea of an AI that can recognize and answer questions about your surroundings in real time feels like something out of a science fiction movie. It also provides a clearer picture of how the latest wave of AI development will find its way into new devices like smart glasses.
Google has offered a hint of this with Project Astra, which may one day have a sunglasses component, but for now it’s mostly experimental. (The video taken during the I/O keynote was apparently a “research prototype.”) But in person, Project Astra didn’t feel like something out of a science fiction movie.
He was able to accurately recognize objects placed around the room and answer nuanced questions about them, such as “Which of these toys should a 2-year-old play with?” He could recognize what was in my doodle and make up stories about the different toys we showed him.
But most of the Astra’s capabilities seemed to be the same as what the Meta had available with smart glasses. Meta’s multimodal AI can also recognize your surroundings and do some creative writing on your behalf. While the meta also considers the features experimental, they are at least widely available.
What sets the Astra apart from Google’s approach is that it has built-in “memory”. After scanning a bunch of objects, he could still “remember” where specific items were placed. For now, Astra’s memory appears to be limited to a relatively short window of time, but members of the research team told us that it could theoretically be expanded. This will obviously open up more possibilities for the technology and make the Astra look more like a true assistant. I don’t need to know where I put my glasses 30 seconds ago, but if you could remember where I put them last night, it would actually feel like science fiction come to life.
But, as with generative artificial intelligence, the most interesting possibilities are those that have yet to fully materialize. Astra may be getting there eventually, but right now it feels like Google still has a long way to go to get there.
Stay up-to-date with all the news from Google I/O 2024 here!