ChatGPT will now be able to see, hear, and speak.
OpenAI on Sept.25 announced it began to roll out new voice and image capabilities in ChatGPT. Now customers will be able to explain their requests vocally or show ChatGPT the image they’re talking about.
The company plans to introduce voice and images to ChatGPT Plus and Enterprise users over the next two weeks. Voice capabilities will be available on iOS and Android and images will be supported on all platforms.
With the help of a voice command, the user can start a conversation. They may receive an answer in a voice form rather than text as well. To hear the AI bot’s answer, choose your preferred voice out of five different voice settings.
The capability is powered by a new text-to-speech model. It can generate human-like audio content from text and a few seconds of sample speech. The speech examples were provided by professional voice actors. OpenAI also used its proprietary open-source speech recognition system Whisper to transcribe spoken words into text.
In addition, customers will be able to show ChatGPT one or more images to illustrate their request more vividly. The images may range from a picture of textbook page to contents of your fridge or a complex graph with work-related data. To draw attention to the specific part of the image, one may use the drawing tool in ChatGPT mobile app. Images can both be captured right away with a smartphone’s camera or uploaded from the existing gallery.
Despite the enthusiasm build-up, OpenAI decided to roll out the innovative features gradually. This phased approach will enable the company to make improvements based on initial user experience and refine risk mitigations over time. OpenAI also beleives the public and industry players need more time to prepare for advanced and more powerful AI models involving voice and vision.
In particular, the voice generating systems may be used by malactors to impersonate other people. To prevent that, one needs functional cybersecurity and legal tools, which are not currently in place. Vision-based AI models also present new challenges to their creators, in terms of responsible usage.
For instance, OpenAI has taken technical measures to limit ChatGPT’s ability to analyze and make direct statements about people since AI bot is not always accurate and may show bias. Besides, similar AI systems should respect individuals’ privacy.
To mitigate the risks presented by AI technology, OpenAI has recently created a new team led by Ilya Sutskever, chief Researcher and one of the co-founders of the company. Its main goal is to find ways to control superintelligent artificial intelligence systems and limit capabilities of this advanced technology in those spheres where it may not be benevolent toward humans.