Arquivo Web Crawler is listed in the Botcrawl directory as a feed retrieval bot from Arquivo. The primary identifier for log review is Arquivo-web-crawler.

Identification

  • User-agent pattern: Arquivo-web-crawler
  • Family: Arquivo
  • Type: Feed
  • Kind: Fetcher

Common use

Feed fetching, subscription updates, podcast retrieval, or content syndication checks.

Verification and handling

Confirm the user-agent against server logs and use published operator documentation, IP ranges, reverse DNS, or other trust signals when available.

Directory guidance marks the risk level as Neutral and the blocking decision as Depends. Do not rely on the user-agent string alone because user-agent strings can be copied or spoofed.

Robots.txt handling: Yes.

Identification

Company
Arquivo
Purpose
feed-fetch
Identity Type
verified-bot
Source Type
verified-directory
HTTP Agent
Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling)

Verification And Behavior

Verification Method
Validate the identifying user-agent or signature against the operator documentation before creating hard allow rules.
Last Verified
2026-04-01
Last Checked
2026-05-20
Robots Token
Arquivo-web-crawler
Respects Robots
yes
Spoofing Risk
User-agent strings can be spoofed. For allow-listing or low-friction rules, pair the published identifier with operator documentation or cryptographic verification when available.

Common Use

Feed fetching, subscription updates, podcast retrieval, or content syndication checks.

Detection Notes

Match `Arquivo-web-crawler` as a case-insensitive substring in HTTP user-agent logs. Use bot_http_agent for full user-agent examples when the client sends a longer browser-like string. Do not treat a user-agent match alone as proof of identity for allow-listing.

Rules And Blocking Notes

User-agent: Arquivo-web-crawler Disallow: /

Identification Note

User-agent strings can be spoofed. Use this record as an identification signal and confirm sensitive allow or block decisions with logs, DNS, IP ranges, request behavior, or operator documentation when available.