If AI is going to take over the world, why can’t it solve the Spelling Bee?


My challenge to our AI managers was simple: help us crack it New York Times Spelling Bee.

I had spent a good part of Saturday evening turning G, Y, A, L, P, O and N into as many words as possible. But after three hours, 141 points, and 37 words—including “nonapology,” “lagoon,” and “analogy”—I hit a wall. A few words were all it took to catapult myself into the “genius” echelon of the Spelling Bee, those who make up 70 percent of all possible words using the given letters, and to the point where the puzzle effectively solves itself. . (Those who solve 100 percent are given the rank of “queen bee,” but I have my limits.)

My human mind was clearly struggling, but the task seemed like child’s play to the AI, so I fired up ChatGPT, told it I was trying to win the Spelling Bee, gave it my letters, and laid out the rules. “Give me a list of every word that can be made with them,” I said. “Repetition of letters is allowed, but every word must have an ‘N’.”

Here’s what ChatGPT spits back when I hit enter:

We present a list of words formed with the letters “GYALPON” and containing the letter “N”:
GLNPAYO
YPNL
PGNOYL
ONAPYLG
LGNAPYO
ANLY
NAYGOPL
GYPON
AOGPLNY
PNAOGLY
Due to the large number of possible permutations, this list is just an example.

Not a great start.

“The words should be in the dictionary,” I wrote back. “Of course,” came the reply. “I will provide a list of valid English words based on the given letters.” ENGLY, GALLON, LANG.

Obviously, that didn’t work, so I turned to Microsoft’s Copilot (YANG, PLAY, PLANNING, MUSCLES), Google’s Gemini (GAPON, GON, GIAN) and Anthropic Claude (MANGO, CONTINUED, LAWN17.LAY). Meta AI helped me by ensuring that it included only words that were recognized by dictionaries. NALYP and nagy, and Perplexity — with a chatbot ambitions About killing Google Search — he just wrote GAL hundreds of times before suddenly freezing.

Confusion in solving the Spelling BeeConfusion in solving the Spelling Bee

Perplexity, a chatbot with ambitions to kill Google Search, fell apart when asked to create words from a set of letters. (Screenshot by Pranav Dixit / Engadget)

Artificial intelligence can now generate images, video and audio as fast as you can type the images you want. Can write poetry, essays and term papers. It can also be a pale imitation of your lover, therapist and personal assistant. And many people think it’s poised to put people out of business and change the world in ways we can hardly imagine. So why is it so hard to solve a simple word puzzle?

The answer lies in how large language models work, the core technology powering our modern artificial intelligence frenzy. Computer programming is traditionally logical and rule-based; you write commands that the computer follows according to a set of instructions, and it provides valid output. But machine learning, a subset of generative artificial intelligence, is different.

“It’s purely statistical,” said Noah Giansiracusa, a professor of mathematics and data science at Bentley University. “It’s really extracting patterns from the data and then pushing new data that matches those patterns.”

OpenAI did not respond on the record, but a company spokesperson told me that such “feedback” helped OpenAI improve its understanding of the model and its responses to problems. Microsoft and Meta declined to comment. Google, Anthropic and Perplexity did not respond by time of publication.

At the heart of large language models are “transformers,” a technical breakthrough made by Google researchers in 2017. After you enter a query, a large language model breaks words, or fractions of words, into mathematical units called “tokens.” Transformers are able to analyze each token in the context of the larger data set on which the model is trained to see how they relate to each other. Once the transformer understands these relationships, it can respond to your command by predicting the next possible sign in the sequence. The Financial Times there is great animated explanation This breaks all of that down, in case you were wondering.

I he thought I was giving the chatbots precise instructions to create my Spelling Bee words, all they did was turn my words into tokens and use transformers to return plausible answers. “It’s not the same as computer programming or typing commands into a DOS prompt,” Giansiracusa said. “Your words were converted into numbers and then processed statistically.” It seems that querying based purely on logic was the worst application for AI skills – it’s like trying to turn a screw with a resource-intensive hammer.

The success of an AI model also depends on the data it is trained on. That’s why AI companies are now feverishly striking deals with news publishers—the fresher the training data, the better the answers. Generative AI, e.g. bad suggests chess moves, but at least that’s it marginally better at task than solving word puzzles. Giansiracusa points out that the plethora of chess games available on the Internet almost always feed into the training data for existing AI models. “I suspect there aren’t enough annotated Spelling Bee games online for the AI ​​to train as there are chess games,” he said.

“If your chatbot seems more confused by a word game than a cat with a Rubik’s cube, it’s because it hasn’t been specifically trained to play complex word games,” said Sandi Bensen, artificial intelligence researcher at Neudesic. IBM. “Word games have specific rules and restrictions that a model will struggle to follow unless specifically instructed during training, fine-tuning, or prompting.”

“If your chatbot seems more confused by word games than a cat with a Rubik’s cube, it’s because it hasn’t been specifically trained to play complex word games.”

None of this has stopped the world’s leading AI companies from often marketing the technology as a panacea grossly exaggerated claims about the possibilities. In April, both OpenAI and Meta boasted that their new AI models would have “reasoning” and “planning” capabilities. Brad Lightcap, COO of OpenAI, in an interview he said the Financial Times The next generation of GPT, the AI ​​model powering ChatGPT, will make progress in solving “hard problems” like reasoning. Meta’s vice president of AI research, Joelle Pineau, told the publication that the company is “working hard on figuring out how to get these models to not just talk, but to actually think, plan…have memory.”

My repeated attempts to get GPT-4o and Llama 3 to crack Spelling Bee have failed spectacularly. When you tell ChatGPT that GALLON, LANG and ANGLY was not in the dictionary, the chatbot agreed with me and suggested GALVANOPY instead of. When I misspelled world as “sure” in response to the meta AI’s suggestion to come up with more words, the chatbot told me that “sur” was actually another word that could be made with G, Y. , A, L, P, O and N.

It’s clear that we’re still a long way from Artificial General Intelligence, a nebulous concept that describes the moment when machines can perform most tasks as well as or better than humans. Some experts, such as Meta’s chief artificial intelligence scientist Yann LeCun, have been outspoken about the limitations of large language models, arguing that they will never reach human-level intelligence because they do not use logic. LeCun at an event in London last year he said The current generation of AI models “do not understand how the world works. They cannot plan. They’re not capable of real judgment,” he said. “We don’t have fully autonomous, self-driving cars that a 17-year-old can learn to drive in about 20 hours of training.”

Giansiracusa, however, strikes a more cautious tone. “We really don’t know how people think, do we? We don’t really know what intelligence is. I don’t know, my brain is just a big statistical calculator, kind of like a more efficient version of a big language model.”

Perhaps the key to living with generative AI without succumbing to either the hype or the anxiety is simply to understand its inherent limitations. Chirag Shah, a professor of artificial intelligence and machine learning at the University of Washington, says: “These tools are not really designed for many of the things that people use them for. He wrote a high-profile piece research case Criticizes the use of large language models in search engines in 2022. According to Shah, tech companies could do a better job of being transparent about what AI can and can’t do before it affects us. However, that ship may have already sailed. Over the past few months, the world’s largest technology companies – Microsoft, Meta, Samsung, appleand Google – have made statements to heavily weave AI into their products, services and operating systems.

“Bots get annoyed because they’re not meant to,” Shah said of my pun puzzle. It remains to be seen if they can absorb all the other challenges tech companies have thrown at them.

How else have AI chatbots failed you? Send me an email pranav.dixit@engadget.com and let me know!



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *