Quick Start

The LibreCrawl API is a RESTful HTTP API that provides programmatic access to all crawling functionality. All endpoints return JSON responses and use session-based authentication.

# Base URL
http://localhost:5000/api

# Example: Start a crawl
curl -X POST http://localhost:5000/api/start_crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}' \
--cookie-jar cookies.txt

# Check crawl status
curl http://localhost:5000/api/crawl_status \
--cookie cookies.txt

Authentication: All API endpoints require session-based authentication. See the Authentication guide for details.

API Overview

The LibreCrawl API is organized into seven main categories:

API Endpoints at a Glance

Authentication

Method Endpoint Description
POST /api/register Create a new user account
POST /api/login Authenticate and create session
POST /api/guest-login Create guest session (limited access)
POST /api/logout End current session
GET /api/user/info Get current user information

Crawl Control

Method Endpoint Description
POST /api/start_crawl Start a new website crawl
POST /api/stop_crawl Stop the active crawl
POST /api/pause_crawl Pause the current crawl
POST /api/resume_crawl Resume a paused crawl

Status & Data

Method Endpoint Description
GET /api/crawl_status Get real-time crawl status and data
GET /api/visualization_data Get graph visualization data

Settings & Configuration

Method Endpoint Description
GET /api/get_settings Retrieve current user settings
POST /api/save_settings Save user settings
POST /api/reset_settings Reset settings to defaults
POST /api/update_crawler_settings Apply settings to active crawler

Export & Filtering

Method Endpoint Description
POST /api/export_data Export crawl data in multiple formats
POST /api/filter_issues Filter issues by exclusion patterns

Debug & Monitoring

Method Endpoint Description
GET /api/debug/memory Get memory stats for all crawler instances
GET /api/debug/memory/profile Get detailed memory breakdown by component

Common Patterns

Request Format

All POST requests should include the Content-Type: application/json header and send data as JSON in the request body:

POST /api/start_crawl HTTP/1.1
Host: localhost:5000
Content-Type: application/json
Cookie: session=...

{
  "url": "https://example.com"
}

Response Format

All API responses return JSON with a consistent structure:

{
  "success": true,
  "message": "Operation completed successfully",
  "data": {
    // Response data
  }
}

Error responses include an error message:

{
  "success": false,
  "error": "Error message describing what went wrong"
}

HTTP Status Codes

  • 200 OK - Request successful
  • 400 Bad Request - Invalid request data or validation error
  • 401 Unauthorized - Authentication required or session invalid
  • 500 Internal Server Error - Server error occurred

Rate Limiting & Access Control

Tier System

LibreCrawl uses a tier-based access control system:

  • Guest Tier - Limited to 3 crawls per 24 hours (IP-based tracking), read-only access
  • User Tier - Unlimited crawls, basic settings access, data export
  • Extra Tier - All User features plus JavaScript rendering, custom filters, CSS customization
  • Admin Tier - Full access to all features including advanced settings (concurrency, memory limits, proxy configuration)

Guest Rate Limiting

Guest users are limited to 3 crawls per 24-hour period, tracked by IP address. The API checks the following headers in order:

  1. CF-Connecting-IP (Cloudflare)
  2. X-Forwarded-For (Proxy)
  3. X-Real-IP (Nginx)
  4. REMOTE_ADDR (Direct connection)

Polling Pattern

LibreCrawl uses HTTP polling instead of WebSockets for real-time updates. Your application should poll the /api/crawl_status endpoint at regular intervals (recommended: 1 second) during an active crawl:

async function pollCrawlStatus() {
  const response = await fetch('/api/crawl_status');
  const data = await response.json();
  
  // Update UI with crawl data
  updateCrawlUI(data);
  
  // Continue polling if crawl is still running
  if (data.status !== 'completed') {
    setTimeout(pollCrawlStatus, 1000);
  }
}

Next Steps

Ready to start building with the LibreCrawl API? Check out these resources: