xAI, an OpenAI competitor founded by Elon Musk, introduced the first version of Grok, which can process visual data. The Grok-1.5V is the company’s first-generation multimodal AI model that not only processes text, but also “documents, diagrams, charts, screenshots and photos.” in XAI announcement, gave several examples of how its capabilities can be used in the real world. You can, for example, show it a picture of a flowchart and ask Grok to translate it into Python code, write a story based on the drawing, or even explain a meme you don’t understand. Hey, not everyone can keep up with everything the internet is spitting out.
The new version comes a few weeks after the company introduced it Grok-1.5. This model is designed to be better at coding and math than its predecessor, and to process longer contexts so it can examine data from more sources to better understand certain queries. xAI said its early testers and existing users will be able to take advantage of the Grok-1.5V soon, though it did not give an exact timeline for its release.
Along with introducing the Grok-1.5V, the company also released a benchmark data set it calls RealWorldQA. You can use any of RealWorldQA’s 700 descriptions to evaluate AI models: Each item comes with questions and answers that you can easily test but can make multimodal models like Grok difficult. The xAI technology was claimed by the company to get the highest score when tested with RealWorldQA against competitors such as OpenAI’s GPT-4V and Google Gemini Pro 1.5.