Bot intelligence record

GPTBot

Review first

GPTBot is an AI training crawler from OpenAI used for AI model training, dataset discovery; it appears in server logs as `GPTBot`.

Ai Ai Training Official Documented Confidence: High Verified: Yes robots.txt: Yes
Operator
OpenAI
Family
OpenAI
Type
Ai
Source type
Official
Last checked
2026-05-20

User-Agent Pattern

OpenAI
GPTBot
Verification note

User-agent strings are identification signals, not proof of identity. Confirm important allow, block, or rate-limit decisions with logs, DNS or IP evidence, request behavior, or operator documentation when available.

Robots.txt Snippet

Click snippet to copy
User-agent: GPTBot
Disallow: /

Click the snippet to copy it, or highlight the text manually.

Handling Guidance

Depends

Use this record as bot intelligence, then verify the request source and behavior before allowing, blocking, or rate limiting.

GPTBot is used for AI model training, dataset discovery, and collection of public web content for model-development pipelines.

Record Details

Structured data
Operator
OpenAI
Family
OpenAI
Type
Ai
Purpose
Ai Training
Identity type
Official Documented
Confidence
High
Last verified
2026-05-20
Last checked
2026-05-20
Source type
Official
Verification
Verify GPTBot by matching `GPTBot` to OpenAI evidence, then checking reverse DNS, source-network ownership, signed request data, or published crawler documentation when available.
Spoofing risk
GPTBot has medium spoofing risk because the user-agent can be copied, even when the bot has strong source or documentation support.

Notes

  • GPTBot is an AI training crawler from OpenAI used for AI model training, dataset discovery, and collection of public web content for model-development pipelines.
  • Its primary user-agent pattern is GPTBot; related patterns include OpenAI training crawler; a representative HTTP user-agent is Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3; +https://openai.com/gptbot.
  • GPTBot is verified with High confidence. The identity type is Official Documented, and the evidence basis is official operator documentation.
  • GPTBot is marked as respecting robots.txt directives for crawler access control.
  • GPTBot should be handled according to the site owner’s AI crawler policy, with allow, block, or rate-limit rules applied deliberately.

Evidence and Source

  • Verify GPTBot by matching `GPTBot` to OpenAI evidence, then checking reverse DNS, source-network ownership, signed request data, or published crawler documentation when available.
  • GPTBot traffic is primarily detected by the `GPTBot` user-agent pattern; related patterns include `OpenAI training crawler`; a representative HTTP user-agent is `Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3; +https://openai.com/gptbot`. Compare source IPs, reverse DNS, request paths, and crawl cadence with OpenAI infrastructure before trusting the traffic.
  • GPTBot is used for AI model training, dataset discovery, and collection of public web content for model-development pipelines.
  • GPTBot has medium spoofing risk because the user-agent can be copied, even when the bot has strong source or documentation support.

Monitor This Bot In Edge

Botcrawl Edge

Use Botcrawl Edge to see matching traffic, create allow or block rules, and control this bot across connected sites.