OpenAI Launches Web Crawling Tool to Train Future AI Models

GPTBot web crawler will collect publicly available data from the world wide web to improve future ChatGPT models

Artificial intelligence firm OpenAI has introduced “GPTBot” — the new web crawling tool which could potentially be used to improve future ChatGPT models.

Web crawlers are specific bots that index the content of varuious websites across the internet. They are often used by search engines to display relevant content in search results.

GPTBot is suppised to collect publicly available data from the internet, except for sources that require paywalled content, gather personally identifiable information, or include text that violates the firm’s policies.

In addition, website owners themselves can deny the web crawler using their resources by adding a “disallow” command to a standard file on the server.

The scraped data will be used to train future generative AI models like the anticipated GPT-5 or enhance the capabilities of the existing OpenAI tools.

At the same time, the launch of the new versions of the popular ChatGPT tool is hindered by legal obstacles. At present, the US Federal Trade Commission (FTC) is investigating OpenAI practices for processing personal data, trying to estimate the likelihood of providing inaccurate answers in response to requests, and the risks of harm to consumers, including damage to reputation.

Earlier in June, a class-action lawsuit against OpenAI was filed in California. The firm allegedly scraped private user information from the internet to train its artificial intelligence (AI) chatbot ChatGPT.

Other regulators across the globe like the Dutch privacy regulator Data Protection Authority have also requested clarification on how the company processes the personal data of consumers during the training of its basic system.

Facing the new challenges presented by generativeAI, global legislators are urged to create a regulatory framework for overseeing the operation of ChatGPT-like chatbots and their handling of sensitive data.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

OpenAI Launches Web Crawling Tool to Train Future AI Models

Nina Bobro

Breaking News

Alipay+ Voyager Launches to Reinvent Travel with AI in Your Wallet

First Pan-African Card Scheme Launches to Boost Cross-Border Payments and Strengthen Intra-African Trade

Mastercard Dives Deeper Into Crypto Solutions

Digital Takes Over: Visa’s 2025 Report Shows Apps Now Dominate North America’s Remittance Landscape

Mastercard and National ITMX Extend Strategic Collaboration Powering Thailand’s Instant Payments

Join Us

Newsletter

OpenAI Launches Web Crawling Tool to Train Future AI Models

Nina Bobro

Related Posts

Breaking News

Join Us

Newsletter

Subscribe to Our Newsletter