Bot intelligence record

Semantic Scholar Bot

Usually allow

Semantic Scholar crawler used to discover academic PDFs and scholarly documents.

Search Academic Search Crawling Official Documented Confidence: High Verified: Yes robots.txt: Yes
Operator
Allen Institute for AI
Type
Search
Source type
Official
Last checked
2026-06-02

User-Agent Pattern

Allen Institute for AI
SemanticScholarBot
Verification note

User-agent strings are identification signals, not proof of identity. Confirm important allow, block, or rate-limit decisions with logs, DNS or IP evidence, request behavior, or operator documentation when available.

Robots.txt Snippet

Click snippet to copy
User-agent: SemanticScholarBot Disallow: /

Click the snippet to copy it, or highlight the text manually.

Handling Guidance

No

This bot is usually safe to allow when the request source is verified and the traffic matches your site policy.

Academic PDF discovery and scholarly indexing for Semantic Scholar.

Record Details

Structured data
Operator
Allen Institute for AI
Type
Search
Purpose
Academic Search Crawling
Identity type
Official Documented
Confidence
High
Last verified
2026-06-02
Last checked
2026-06-02
Source type
Official
Verification
Validate the identifying user-agent against operator documentation, reverse DNS, published IP ranges, signatures, or other trust signals before creating hard allow rules.
Spoofing risk
User-agent strings can be spoofed. Pair the claimed identifier with operator documentation, IP verification, reverse DNS, signatures, or other available trust signals before creating low-friction allow rules.

Notes

Semantic Scholar Bot is listed in the Botcrawl directory as a search bot from Semantic Scholar. The primary identifier for log review is SemanticScholarBot.

Identification

  • User-agent pattern: SemanticScholarBot
  • Family: Semantic Scholar
  • Type: Search
  • Kind: Crawler

Common use

Academic PDF discovery and scholarly indexing for Semantic Scholar.

Verification and handling

Confirm the user-agent against server logs and use published operator documentation, IP ranges, reverse DNS, signatures, or other trust signals when available.

Directory guidance marks the risk level as Safe and the blocking decision as No. Do not rely on the user-agent string alone because user-agent strings can be copied or spoofed.

Robots.txt handling: Yes.

Evidence and Source

  • Validate the identifying user-agent against operator documentation, reverse DNS, published IP ranges, signatures, or other trust signals before creating hard allow rules.
  • Match `SemanticScholarBot` as a case-insensitive substring in HTTP user-agent logs. Do not treat a user-agent match alone as proof of identity for allow-listing.
  • Academic PDF discovery and scholarly indexing for Semantic Scholar.
  • User-agent strings can be spoofed. Pair the claimed identifier with operator documentation, IP verification, reverse DNS, signatures, or other available trust signals before creating low-friction allow rules.

Monitor This Bot In Edge

Botcrawl Edge

Use Botcrawl Edge to see matching traffic, create allow or block rules, and control this bot across connected sites.