Settings & Configuration API - LibreCrawl API Documentation

The Settings API allows you to configure every aspect of the crawler's behavior. Settings are persisted per user and automatically applied to new crawls. Different user tiers have access to different settings categories.

Tier-Based Access: Settings are grouped by user tier. Guest users cannot modify settings. User tier gets basic settings, Extra tier adds JavaScript and filters, Admin tier gets all settings including advanced configuration.

Endpoints

GET /api/get_settings

Retrieve the current user's saved settings. Returns default values for any settings not explicitly configured.

Authentication

Requires valid session cookie (user tier or above).

Example Request

            
curl http://localhost:5000/api/get_settings \

  -b cookies.txt

Success Response (200 OK)

            
{

  "success": true,

  "settings": {

    // Crawler Settings

    "maxDepth": 5,

    "maxUrls": 10000,

    "crawlDelay": 0.5,

    "followRedirects": true,

    "crawlExternalLinks": false,

    // Request Settings

    "userAgent": "LibreCrawl/1.0",

    "timeout": 30,

    "retries": 3,

    "acceptLanguage": "en-US,en;q=0.9",

    "respectRobotsTxt": true,

    "allowCookies": true,

    "discoverSitemaps": true,

    "enablePageSpeed": false,

    "googleApiKey": "",

    // Filter Settings (Extra tier)

    "includeExtensions": "html,htm,php,asp,aspx",

    "excludeExtensions": "jpg,jpeg,png,gif,pdf,zip",

    "includePatterns": "",

    "excludePatterns": "",

    "maxFileSize": 10,

    // JavaScript Settings (Extra tier)

    "enableJavaScript": false,

    "jsWaitTime": 2,

    "jsTimeout": 30,

    "jsBrowser": "chromium",

    "jsHeadless": true,

    "jsUserAgent": "",

    "jsViewportWidth": 1920,

    "jsViewportHeight": 1080,

    "jsMaxConcurrentPages": 5,

    // Export Settings

    "exportFormat": "csv",

    "exportFields": ["url", "status_code", "title"],

    // Advanced Settings (Admin tier)

    "concurrency": 10,

    "memoryLimit": 2048,

    "logLevel": "INFO",

    "saveSession": true,

    "enableProxy": false,

    "proxyUrl": "",

    "customHeaders": "{}",

    // Custom UI (Extra tier)

    "customCSS": "",

    // Issue Management

    "issueExclusionPatterns": ""

  }

}

POST /api/save_settings

Save user settings. Only saves settings accessible to the user's tier level. Settings are validated and applied immediately to the crawler instance if one exists.

Authentication

Requires valid session cookie (user tier or above).

Request Body

Send a JSON object with any or all settings fields. Only provided fields will be updated.

Example Request

            
curl -X POST http://localhost:5000/api/save_settings \

  -H "Content-Type: application/json" \

  -b cookies.txt \

  -d '{

    "maxDepth": 3,

    "maxUrls": 5000,

    "crawlDelay": 1.0,

    "respectRobotsTxt": true,

    "enableJavaScript": true,

    "jsWaitTime": 3

  }'

Success Response (200 OK)

            
{

  "success": true,

  "message": "Settings saved successfully"

}

Error Response (400 Bad Request)

            
{

  "success": false,

  "error": "Invalid setting value: maxDepth must be between 1 and 5000000"

}

POST /api/reset_settings

Reset all settings to their default values.

Authentication

Requires valid session cookie (user tier or above).

Example Request

            
curl -X POST http://localhost:5000/api/reset_settings \

  -b cookies.txt

Success Response (200 OK)

            
{

  "success": true,

  "message": "Settings reset to defaults"

}

POST /api/update_crawler_settings

Apply saved settings to the active crawler instance. Useful for updating settings during a paused crawl.

Authentication

Requires valid session cookie (user tier or above).

Example Request

            
curl -X POST http://localhost:5000/api/update_crawler_settings \

  -b cookies.txt

Success Response (200 OK)

            
{

  "success": true,

  "message": "Crawler settings updated"

}

Note: Some settings (like maxDepth, maxUrls) cannot be changed during an active crawl. Pause the crawl first, update settings, then resume.

Settings Reference

Crawler Settings (All tiers)

Setting	Type	Default	Description
maxDepth	number	5	Maximum crawl depth (1-5000000)
maxUrls	number	10000	Maximum URLs to discover (1-5000000)
crawlDelay	number	0.5	Delay between requests in seconds (0-60)
followRedirects	boolean	true	Follow HTTP redirects (301, 302, etc.)
crawlExternalLinks	boolean	false	Crawl links to external domains

Request Settings (User tier+)

Setting	Type	Default	Description
userAgent	string	"LibreCrawl/1.0"	User-Agent header for requests
timeout	number	30	Request timeout in seconds (1-300)
retries	number	3	Number of retries for failed requests (0-10)
acceptLanguage	string	"en-US,en;q=0.9"	Accept-Language header value
respectRobotsTxt	boolean	true	Respect robots.txt directives
allowCookies	boolean	true	Enable cookie handling
discoverSitemaps	boolean	true	Auto-discover and crawl sitemaps
enablePageSpeed	boolean	false	Enable Google PageSpeed analysis
googleApiKey	string	""	Google API key for PageSpeed (required if enabled)

Filter Settings (Extra tier+)

Setting	Type	Default	Description
includeExtensions	string	"html,htm,php,asp,aspx"	Comma-separated file extensions to include
excludeExtensions	string	"jpg,jpeg,png,gif,pdf,zip"	Comma-separated file extensions to exclude
includePatterns	string	""	Regex patterns for URLs to include (one per line)
excludePatterns	string	""	Regex patterns for URLs to exclude (one per line)
maxFileSize	number	10	Maximum file size in MB (1-1000)

JavaScript Rendering (Extra tier+)

Setting	Type	Default	Description
enableJavaScript	boolean	false	Enable JavaScript rendering with headless browser
jsWaitTime	number	2	Seconds to wait after page load (0-60)
jsTimeout	number	30	Page load timeout in seconds (1-300)
jsBrowser	string	"chromium"	Browser engine (chromium, firefox, webkit)
jsHeadless	boolean	true	Run browser in headless mode
jsUserAgent	string	""	Custom User-Agent for JS rendering (empty = default)
jsViewportWidth	number	1920	Browser viewport width (320-3840)
jsViewportHeight	number	1080	Browser viewport height (240-2160)
jsMaxConcurrentPages	number	5	Max concurrent browser pages (1-20)

Export Settings (All tiers)

Setting	Type	Default	Description
exportFormat	string	"csv"	Default export format (csv, json, xml)
exportFields	array	["url", "status_code", "title"]	Default fields to include in exports

Advanced Settings (Admin tier)

Setting	Type	Default	Description
concurrency	number	10	Max concurrent requests (1-100)
memoryLimit	number	2048	Memory limit in MB (128-16384)
logLevel	string	"INFO"	Logging level (DEBUG, INFO, WARNING, ERROR)
saveSession	boolean	true	Save crawl state for resumption
enableProxy	boolean	false	Use proxy for requests
proxyUrl	string	""	Proxy URL (http://host:port or socks5://host:port)
customHeaders	string	"{}"	JSON string of custom HTTP headers

UI Customization (Extra tier+)

Setting	Type	Default	Description
customCSS	string	""	Custom CSS for UI customization

Issue Management (All tiers)

Setting	Type	Default	Description
issueExclusionPatterns	string	""	Wildcard patterns for URLs to exclude from issue detection (one per line)

Best Practices

1. Set Reasonable Limits

Configure maxDepth and maxUrls based on your target site size:

Small sites (<1K pages): maxDepth: 5, maxUrls: 5000
Medium sites (1K-50K pages): maxDepth: 10, maxUrls: 50000
Large sites (50K+ pages): maxDepth: 15, maxUrls: 500000+

2. Be Respectful with Crawl Delay

Set appropriate crawlDelay to avoid overwhelming target servers:

Own sites: 0.1-0.5 seconds
Small sites: 1-2 seconds
Large/production sites: 2-5 seconds

3. JavaScript Rendering Performance

JavaScript rendering is resource-intensive. Optimize with:

jsMaxConcurrentPages: Start with 5, increase if system has resources
jsWaitTime: Use minimum time needed (2-3 seconds typical)
jsHeadless: Always use true in production for better performance

4. Filter Efficiently

Use excludeExtensions to skip binary files and media:

          
"excludeExtensions": "jpg,jpeg,png,gif,svg,webp,pdf,zip,exe,dmg,mp4,mp3,avi,mov,css,js"

5. Custom Headers for Authentication

For crawling authenticated sections, use customHeaders:

          
"customHeaders": "{\"Authorization\": \"Bearer your-token-here\", \"X-Custom-Header\": \"value\"}"

Next Steps

Crawl Control API - Start crawls with configured settings
Export API - Export data with configured format
Getting Started Guide - Complete integration tutorial