Warning: exif_imagetype(https://payspacemagazine.com/wp-content/uploads/2024/10/openai-enhances-voice-and-vision-capabilities.jpg): failed to open stream: Connection refused in /home/deploy/sites/payspacemagazine.com/wp-includes/functions.php on line 3314

Warning: file_get_contents(https://payspacemagazine.com/wp-content/uploads/2024/10/openai-enhances-voice-and-vision-capabilities.jpg): failed to open stream: Connection refused in /home/deploy/sites/payspacemagazine.com/wp-includes/functions.php on line 3336

Warning: exif_imagetype(https://payspacemagazine.com/wp-content/uploads/2024/10/openai-enhances-voice-and-vision-capabilities.jpg): failed to open stream: Connection refused in /home/deploy/sites/payspacemagazine.com/wp-includes/functions.php on line 3314

Warning: file_get_contents(https://payspacemagazine.com/wp-content/uploads/2024/10/openai-enhances-voice-and-vision-capabilities.jpg): failed to open stream: Connection refused in /home/deploy/sites/payspacemagazine.com/wp-includes/functions.php on line 3336
Science & Technology

OpenAI Enhances Voice and Vision Capabilities

OpenAI recent updates significantly enhance ChatGPT’s capabilities in voice and visual information deciphering.

OpenAI Enhances Voice and Vision Capabilities

OpenAI launched a set of new features for its AI models powering the popular chatbot ChatGPT, which includes the ability to interpret and respond to real-time audio and video inputs, offering a more human-like interaction.

OpenAI’s latest model, GPT-4o, now supports livestreaming, where users can communicate more naturally with the AI, which can recognize emotions and breathing patterns, as well as handle complex inputs across text, audio, and visuals simultaneously.

One of the newest updates is Realtime API, designed to process real-time data and enable natural conversations similar to ChatGPT’s Advanced Voice Mode. It enhances developers’ ability to build applications that generate up-to-the-moment responses, particularly useful for live data input apps that require immediate, responsive data processing like stock market updates, weather information, or sports scores. The tool is gradually rolled out to all paid developers.

Another update offers fine-tuning tools for developers so that they can improve AI responses generated from images and text inputs. This new feature enables customization for visual search, object detection, and medical image analysis. With as few as 100 images, developers can now improve the model’s ability to handle vision-related tasks, such as identifying UI elements in software or analyzing street-level imagery for mapping purposes.

Developers can start testing the above mentioned new feature immediately, with a limited free offering until October 31, 2024. After that, standard pricing applies for both training and inference.

Besides that, OpenAI also rolled out “model distillation” and “prompt caching” features that enable smaller AI models to learn from larger ones. It may significantly reduce development costs and time.

With the new functionality, OpenAI continues to expand its platform’s versatility, supporting dynamic user interactions and new opportunities for innovation.

Constant expansions and updates of the product portfolio have brought OpenAI over 1 million paid users, subscribing to OpenAI’s business products including ChatGPT Enterprise, Team, and Edu. It was also reported that the Vision Fund, a subsidiary of the Japanese telecom giant SoftBank, intends to invest $500 million in the latest round of funding for OpenAI, further acknowledging the firm’s potential in the tech segment.

Nina Bobro

1623 Posts 0 Comments

https://payspacemagazine.com/

Nina is passionate about financial technologies and environmental issues, reporting on the industry news and the most exciting projects that build their offerings around the intersection of fintech and sustainability.