Bot intelligence record

CCBot

Review first

Use the CCBot identifier to separate Common Crawl security scanning or verification traffic from normal visitor requests in server logs.

Security Official Documented Confidence: High Verified: Yes robots.txt: Yes
Operator
Common Crawl
Family
Common Crawl
Type
Security
Source type
Official
Last checked
2026-05-20

User-Agent Pattern

Common Crawl
CCBot
Verification note

User-agent strings are identification signals, not proof of identity. Confirm important allow, block, or rate-limit decisions with logs, DNS or IP evidence, request behavior, or operator documentation when available.

Robots.txt Snippet

Click snippet to copy
User-agent: CCBot Disallow: /

Click the snippet to copy it, or highlight the text manually.

Handling Guidance

Depends

Use this record as bot intelligence, then verify the request source and behavior before allowing, blocking, or rate limiting.

Security scanning, malware checks, abuse prevention, compliance review, or vulnerability monitoring.

Record Details

Structured data
Operator
Common Crawl
Family
Common Crawl
Type
Security
Purpose
Security
Identity type
Official Documented
Confidence
High
Last verified
2026-04-01
Last checked
2026-05-20
Source type
Official
Verification
Verify reverse DNS in crawl.commoncrawl.org and match IPs against Common Crawl's published JSON ranges.
Spoofing risk
User-agent strings for CCBot can be spoofed. Treat user-agent detection as a classification signal, then verify with published IP ranges, reverse DNS, signatures, operator documentation, or published operator documentation, IP ranges, reverse DNS, signatures, or other verified identity signals before allow-listing.

Notes

CCBot is listed in the Botcrawl directory as a security scanner from Common Crawl. The primary identifier for log review is CCBot.

Identification

  • User-agent pattern: CCBot
  • Family: Common Crawl
  • Type: Security
  • Kind: Scanner

Common use

Security scanning, malware checks, abuse prevention, compliance review, or vulnerability monitoring.

Verification and handling

Verify reverse DNS in crawl.commoncrawl.org and match IPs against Common Crawl's published JSON ranges.

Directory guidance marks the risk level as Neutral and the blocking decision as Depends. Do not rely on the user-agent string alone because user-agent strings can be copied or spoofed.

Robots.txt handling: Yes.

Evidence and Source

  • Verify reverse DNS in crawl.commoncrawl.org and match IPs against Common Crawl's published JSON ranges.
  • Match `CCBot` as a case-insensitive substring in HTTP user-agent logs. Review bot_aliases for alternate names or product labels. Use bot_http_agent for full user-agent examples when the client sends a longer browser-like string. Do not treat a user-agent match alone as proof of identity for allow-listing.
  • Security scanning, malware checks, abuse prevention, compliance review, or vulnerability monitoring.
  • User-agent strings for CCBot can be spoofed. Treat user-agent detection as a classification signal, then verify with published IP ranges, reverse DNS, signatures, operator documentation, or published operator documentation, IP ranges, reverse DNS, signatures, or other verified identity signals before allow-listing.

Monitor This Bot In Edge

Botcrawl Edge

Use Botcrawl Edge to see matching traffic, create allow or block rules, and control this bot across connected sites.