The Settings API allows you to configure every aspect of the crawler's behavior. Settings are persisted per user and automatically applied to new crawls. Different user tiers have access to different settings categories.

Tier-Based Access: Settings are grouped by user tier. Guest users cannot modify settings. User tier gets basic settings, Extra tier adds JavaScript and filters, Admin tier gets all settings including advanced configuration.

Endpoints

GET /api/get_settings

Retrieve the current user's saved settings. Returns default values for any settings not explicitly configured.

Authentication

Requires valid session cookie (user tier or above).

Example Request

curl http://localhost:5000/api/get_settings \
  -b cookies.txt

Success Response (200 OK)

{
  "success": true,
  "settings": {
    // Crawler Settings
    "maxDepth": 5,
    "maxUrls": 10000,
    "crawlDelay": 0.5,
    "followRedirects": true,
    "crawlExternalLinks": false,
    
    // Request Settings
    "userAgent": "LibreCrawl/1.0",
    "timeout": 30,
    "retries": 3,
    "acceptLanguage": "en-US,en;q=0.9",
    "respectRobotsTxt": true,
    "allowCookies": true,
    "discoverSitemaps": true,
    "enablePageSpeed": false,
    "googleApiKey": "",
    
    // Filter Settings (Extra tier)
    "includeExtensions": "html,htm,php,asp,aspx",
    "excludeExtensions": "jpg,jpeg,png,gif,pdf,zip",
    "includePatterns": "",
    "excludePatterns": "",
    "maxFileSize": 10,
    
    // JavaScript Settings (Extra tier)
    "enableJavaScript": false,
    "jsWaitTime": 2,
    "jsTimeout": 30,
    "jsBrowser": "chromium",
    "jsHeadless": true,
    "jsUserAgent": "",
    "jsViewportWidth": 1920,
    "jsViewportHeight": 1080,
    "jsMaxConcurrentPages": 5,
    
    // Export Settings
    "exportFormat": "csv",
    "exportFields": ["url", "status_code", "title"],
    
    // Advanced Settings (Admin tier)
    "concurrency": 10,
    "memoryLimit": 2048,
    "logLevel": "INFO",
    "saveSession": true,
    "enableProxy": false,
    "proxyUrl": "",
    "customHeaders": "{}",
    
    // Custom UI (Extra tier)
    "customCSS": "",
    
    // Issue Management
    "issueExclusionPatterns": ""
  }
}
POST /api/save_settings

Save user settings. Only saves settings accessible to the user's tier level. Settings are validated and applied immediately to the crawler instance if one exists.

Authentication

Requires valid session cookie (user tier or above).

Request Body

Send a JSON object with any or all settings fields. Only provided fields will be updated.

Example Request

curl -X POST http://localhost:5000/api/save_settings \
  -H "Content-Type: application/json" \
  -b cookies.txt \
  -d '{
    "maxDepth": 3,
    "maxUrls": 5000,
    "crawlDelay": 1.0,
    "respectRobotsTxt": true,
    "enableJavaScript": true,
    "jsWaitTime": 3
  }'

Success Response (200 OK)

{
  "success": true,
  "message": "Settings saved successfully"
}

Error Response (400 Bad Request)

{
  "success": false,
  "error": "Invalid setting value: maxDepth must be between 1 and 5000000"
}
POST /api/reset_settings

Reset all settings to their default values.

Authentication

Requires valid session cookie (user tier or above).

Example Request

curl -X POST http://localhost:5000/api/reset_settings \
  -b cookies.txt

Success Response (200 OK)

{
  "success": true,
  "message": "Settings reset to defaults"
}
POST /api/update_crawler_settings

Apply saved settings to the active crawler instance. Useful for updating settings during a paused crawl.

Authentication

Requires valid session cookie (user tier or above).

Example Request

curl -X POST http://localhost:5000/api/update_crawler_settings \
  -b cookies.txt

Success Response (200 OK)

{
  "success": true,
  "message": "Crawler settings updated"
}

Note: Some settings (like maxDepth, maxUrls) cannot be changed during an active crawl. Pause the crawl first, update settings, then resume.

Settings Reference

Crawler Settings (All tiers)

Setting Type Default Description
maxDepth number 5 Maximum crawl depth (1-5000000)
maxUrls number 10000 Maximum URLs to discover (1-5000000)
crawlDelay number 0.5 Delay between requests in seconds (0-60)
followRedirects boolean true Follow HTTP redirects (301, 302, etc.)
crawlExternalLinks boolean false Crawl links to external domains

Request Settings (User tier+)

Setting Type Default Description
userAgent string "LibreCrawl/1.0" User-Agent header for requests
timeout number 30 Request timeout in seconds (1-300)
retries number 3 Number of retries for failed requests (0-10)
acceptLanguage string "en-US,en;q=0.9" Accept-Language header value
respectRobotsTxt boolean true Respect robots.txt directives
allowCookies boolean true Enable cookie handling
discoverSitemaps boolean true Auto-discover and crawl sitemaps
enablePageSpeed boolean false Enable Google PageSpeed analysis
googleApiKey string "" Google API key for PageSpeed (required if enabled)

Filter Settings (Extra tier+)

Setting Type Default Description
includeExtensions string "html,htm,php,asp,aspx" Comma-separated file extensions to include
excludeExtensions string "jpg,jpeg,png,gif,pdf,zip" Comma-separated file extensions to exclude
includePatterns string "" Regex patterns for URLs to include (one per line)
excludePatterns string "" Regex patterns for URLs to exclude (one per line)
maxFileSize number 10 Maximum file size in MB (1-1000)

JavaScript Rendering (Extra tier+)

Setting Type Default Description
enableJavaScript boolean false Enable JavaScript rendering with headless browser
jsWaitTime number 2 Seconds to wait after page load (0-60)
jsTimeout number 30 Page load timeout in seconds (1-300)
jsBrowser string "chromium" Browser engine (chromium, firefox, webkit)
jsHeadless boolean true Run browser in headless mode
jsUserAgent string "" Custom User-Agent for JS rendering (empty = default)
jsViewportWidth number 1920 Browser viewport width (320-3840)
jsViewportHeight number 1080 Browser viewport height (240-2160)
jsMaxConcurrentPages number 5 Max concurrent browser pages (1-20)

Export Settings (All tiers)

Setting Type Default Description
exportFormat string "csv" Default export format (csv, json, xml)
exportFields array ["url", "status_code", "title"] Default fields to include in exports

Advanced Settings (Admin tier)

Setting Type Default Description
concurrency number 10 Max concurrent requests (1-100)
memoryLimit number 2048 Memory limit in MB (128-16384)
logLevel string "INFO" Logging level (DEBUG, INFO, WARNING, ERROR)
saveSession boolean true Save crawl state for resumption
enableProxy boolean false Use proxy for requests
proxyUrl string "" Proxy URL (http://host:port or socks5://host:port)
customHeaders string "{}" JSON string of custom HTTP headers

UI Customization (Extra tier+)

Setting Type Default Description
customCSS string "" Custom CSS for UI customization

Issue Management (All tiers)

Setting Type Default Description
issueExclusionPatterns string "" Wildcard patterns for URLs to exclude from issue detection (one per line)

Best Practices

1. Set Reasonable Limits

Configure maxDepth and maxUrls based on your target site size:

  • Small sites (<1K pages): maxDepth: 5, maxUrls: 5000
  • Medium sites (1K-50K pages): maxDepth: 10, maxUrls: 50000
  • Large sites (50K+ pages): maxDepth: 15, maxUrls: 500000+

2. Be Respectful with Crawl Delay

Set appropriate crawlDelay to avoid overwhelming target servers:

  • Own sites: 0.1-0.5 seconds
  • Small sites: 1-2 seconds
  • Large/production sites: 2-5 seconds

3. JavaScript Rendering Performance

JavaScript rendering is resource-intensive. Optimize with:

  • jsMaxConcurrentPages: Start with 5, increase if system has resources
  • jsWaitTime: Use minimum time needed (2-3 seconds typical)
  • jsHeadless: Always use true in production for better performance

4. Filter Efficiently

Use excludeExtensions to skip binary files and media:

"excludeExtensions": "jpg,jpeg,png,gif,svg,webp,pdf,zip,exe,dmg,mp4,mp3,avi,mov,css,js"

5. Custom Headers for Authentication

For crawling authenticated sections, use customHeaders:

"customHeaders": "{\"Authorization\": \"Bearer your-token-here\", \"X-Custom-Header\": \"value\"}"

Next Steps