Diffbot-User is listed in the Botcrawl directory as a crawler used for scraping, SEO analysis, or data collection from Diffbot. The primary identifier for log review is Diffbot-User.

Identification

  • User-agent pattern: Diffbot-User
  • Family: Diffbot
  • Type: Scraper
  • Kind: Fetcher

Common use

Public web data collection, SEO analysis, content extraction, or third-party crawling activity.

Verification and handling

Confirm the user-agent against server logs and use published operator documentation, IP ranges, reverse DNS, or other trust signals when available.

Directory guidance marks the risk level as Neutral and the blocking decision as Depends. Do not rely on the user-agent string alone because user-agent strings can be copied or spoofed.

Robots.txt handling: Yes.

Identification

Aliases
Diffbot user fetcher
Company
Diffbot
Purpose
scraping
Identity Type
official-documented
Source Type
official
HTTP Agent
Diffbot-User

Verification And Behavior

Verification Method
Compare the observed user-agent against the documented Diffbot-User pattern. Where available, confirm with operator documentation, published IP ranges, reverse DNS, signed-agent metadata, or published operator documentation, reverse DNS, published IP ranges, signatures, or other trust signals.
Last Verified
2026-04-29
Last Checked
2026-05-20
Robots Token
Diffbot-User
Respects Robots
yes
Spoofing Risk
User-agent strings can be spoofed. For allow-listing or low-friction rules, pair the published identifier with operator documentation or reverse DNS/IP verification when available.

Common Use

Public web data collection, SEO analysis, content extraction, or third-party crawling activity.

Detection Notes

Match `Diffbot-User` as a case-insensitive substring in HTTP user-agent logs. Review bot_aliases for alternate names or product labels. Do not treat a user-agent match alone as proof of identity for allow-listing.

Rules And Blocking Notes

User-agent: Diffbot-User Disallow: /

Identification Note

User-agent strings can be spoofed. Use this record as an identification signal and confirm sensitive allow or block decisions with logs, DNS, IP ranges, request behavior, or operator documentation when available.