NVIDIA has debuted a new experimental generative AI model it describes as a “Swiss Army knife for audio.” The model is called Foundational Generative Audio Transformer Opus 1 or Fugattocan take commands from text queries and use them to create audio or modify existing music, audio and sound files. It was developed by artificial intelligence researchers from around the world, and NVIDIA says it has “enhanced the multi-accent and multilingual capabilities” of this model.
“We wanted to create a model that understands and produces sound like humans,” said Rafael Valle, one of the researchers behind the project and manager of applied audio research at NVIDIA. The company listed some real-world scenarios in which Fugatto could be used in its announcement. Music producers, he suggested, can use the technology to quickly create a prototype for a song idea, then easily edit it to try out different styles, sounds and instruments.
People could use it to create materials for audio language learning tools of their choice. Video game developers could use it to create variations of pre-recorded assets to adapt to changes in gameplay based on players’ choices and actions. In addition, the researchers found that with some fine-tuning, the model could perform tasks that were not part of the prior training. He can combine instructions that he was taught separately, such as an angry-sounding speech with a special accent, or birds singing in a thunderstorm. The model can generate sounds that change over time, such as the pounding of a thunderstorm as it moves over land.
NVIDIA hasn’t said whether it will provide public access to Fugatto, but the model isn’t the first generative AI technology that can generate sounds from text queries. The meta was previously released open source AI suite can generate sounds from text descriptions. Google has its own AI called text-to-music AI MusicLM that people can get through the company AI Test Kitchen website.