Bot intelligence record
Semantic Scholar Bot
Usually allowSemantic Scholar crawler used to discover academic PDFs and scholarly documents.
- Operator
- Allen Institute for AI
- Family
- Semantic Scholar
- Type
- Search
- Source type
- Official
- Last checked
- 2026-06-02
User-Agent Pattern
Allen Institute for AISemanticScholarBot
User-agent strings are identification signals, not proof of identity. Confirm important allow, block, or rate-limit decisions with logs, DNS or IP evidence, request behavior, or operator documentation when available.
Robots.txt Snippet
Click snippet to copyUser-agent: SemanticScholarBot Disallow: /
Click the snippet to copy it, or highlight the text manually.
Handling Guidance
NoThis bot is usually safe to allow when the request source is verified and the traffic matches your site policy.
Academic PDF discovery and scholarly indexing for Semantic Scholar.
Record Details
Structured data- Operator
- Allen Institute for AI
- Family
- Semantic Scholar
- Type
- Search
- Purpose
- Academic Search Crawling
- Identity type
- Official Documented
- Confidence
- High
- Last verified
- 2026-06-02
- Last checked
- 2026-06-02
- Source type
- Official
- Verification
- Validate the identifying user-agent against operator documentation, reverse DNS, published IP ranges, signatures, or other trust signals before creating hard allow rules.
- Spoofing risk
- User-agent strings can be spoofed. Pair the claimed identifier with operator documentation, IP verification, reverse DNS, signatures, or other available trust signals before creating low-friction allow rules.
Notes
Semantic Scholar Bot is listed in the Botcrawl directory as a search bot from Semantic Scholar. The primary identifier for log review is SemanticScholarBot.
Identification
- User-agent pattern:
SemanticScholarBot - Family: Semantic Scholar
- Type: Search
- Kind: Crawler
Common use
Academic PDF discovery and scholarly indexing for Semantic Scholar.
Verification and handling
Confirm the user-agent against server logs and use published operator documentation, IP ranges, reverse DNS, signatures, or other trust signals when available.
Directory guidance marks the risk level as Safe and the blocking decision as No. Do not rely on the user-agent string alone because user-agent strings can be copied or spoofed.
Robots.txt handling: Yes.
Evidence and Source
- Validate the identifying user-agent against operator documentation, reverse DNS, published IP ranges, signatures, or other trust signals before creating hard allow rules.
- Match `SemanticScholarBot` as a case-insensitive substring in HTTP user-agent logs. Do not treat a user-agent match alone as proof of identity for allow-listing.
- Academic PDF discovery and scholarly indexing for Semantic Scholar.
- User-agent strings can be spoofed. Pair the claimed identifier with operator documentation, IP verification, reverse DNS, signatures, or other available trust signals before creating low-friction allow rules.
Monitor This Bot In Edge
Botcrawl EdgeUse Botcrawl Edge to see matching traffic, create allow or block rules, and control this bot across connected sites.
