This tutorial will walk you through building a complete application that authenticates, starts a crawl, monitors progress in real-time, and exports the results. You'll learn the essential patterns for working with the LibreCrawl API.

Prerequisites

LibreCrawl installed and running (default: http://localhost:5000)
Basic knowledge of HTTP/REST APIs
Familiarity with JSON
A programming language with HTTP client library (JavaScript, Python, etc.)

What We'll Build

We'll create a simple crawler application that:

Authenticates with the API
Configures crawler settings
Starts a website crawl
Polls for real-time status updates
Displays progress and statistics
Exports results when complete

Tutorial

Step 1: Set Up Your Environment

First, ensure LibreCrawl is running. For development, we'll use local mode for easier testing:

            
# Start LibreCrawl in local mode (all users get admin tier)

python main.py --local

LibreCrawl should now be accessible at http://localhost:5000

Step 2: Authentication

Create a session by logging in. In local mode, use guest login for quick access:

JavaScript Example

            
// Store cookies for session management

const sessionCookies = {};

async function login() {

  const response = await fetch('http://localhost:5000/api/guest-login', {

    method: 'POST',

    credentials: 'include'  // Important: include cookies

  });

  const data = await response.json();

  if (data.success) {

    console.log('✓ Authenticated successfully');

    return true;

  } else {

    console.error('✗ Authentication failed:', data.error);

    return false;

  }

}

await login();

Python Example

            
import requests

BASE_URL = 'http://localhost:5000'

session = requests.Session()

def login():

    response = session.post(f'{BASE_URL}/api/guest-login')

    data = response.json()

    if data['success']:

        print('✓ Authenticated successfully')

        return True

    else:

        print(f'✗ Authentication failed: {data["error"]}')

        return False

login()

Alternative: For production applications, use /api/register and /api/login with username/password instead of guest login.

Step 3: Configure Crawler Settings (Optional)

Before starting a crawl, you can customize settings. For this tutorial, we'll set basic limits:

JavaScript Example

            
async function configureSettings() {

  const settings = {

    maxDepth: 3,

    maxUrls: 100,

    crawlDelay: 0.5,

    respectRobotsTxt: true

  };

  const response = await fetch('http://localhost:5000/api/save_settings', {

    method: 'POST',

    headers: { 'Content-Type': 'application/json' },

    credentials: 'include',

    body: JSON.stringify(settings)

  });

  const data = await response.json();

  if (data.success) {

    console.log('✓ Settings configured');

  } else {

    console.error('✗ Settings error:', data.error);

  }

}

await configureSettings();

Python Example

            
def configure_settings():

    settings = {

        'maxDepth': 3,

        'maxUrls': 100,

        'crawlDelay': 0.5,

        'respectRobotsTxt': True

    }

    response = session.post(

        f'{BASE_URL}/api/save_settings',

        json=settings

    )

    data = response.json()

    if data['success']:

        print('✓ Settings configured')

    else:

        print(f'✗ Settings error: {data["error"]}')

configure_settings()

Step 4: Start the Crawl

Now we'll start crawling a website. For this example, we'll use a small test site:

JavaScript Example

            
async function startCrawl(url) {

  const response = await fetch('http://localhost:5000/api/start_crawl', {

    method: 'POST',

    headers: { 'Content-Type': 'application/json' },

    credentials: 'include',

    body: JSON.stringify({ url })

  });

  const data = await response.json();

  if (data.success) {

    console.log(`✓ Crawl started for ${url}`);

    return true;

  } else {

    console.error('✗ Crawl start failed:', data.error);

    return false;

  }

}

await startCrawl('https://example.com');

Python Example

            
def start_crawl(url):

    response = session.post(

        f'{BASE_URL}/api/start_crawl',

        json={'url': url}

    )

    data = response.json()

    if data['success']:

        print(f'✓ Crawl started for {url}')

        return True

    else:

        print(f'✗ Crawl start failed: {data["error"]}')

        return False

start_crawl('https://example.com')

Step 5: Monitor Progress in Real-Time

The most important part: polling for crawl status. We'll poll every second and display progress:

JavaScript Example

            
async function monitorCrawl() {

  let isRunning = true;

  while (isRunning) {

    const response = await fetch('http://localhost:5000/api/crawl_status', {

      credentials: 'include'

    });

    const data = await response.json();

    // Display progress

    console.clear();

    console.log('=== LibreCrawl Status ===');

    console.log(`Status: ${data.status}`);

    console.log(`Progress: ${data.progress.toFixed(1)}%`);

    console.log(`Discovered: ${data.stats.discovered} URLs`);

    console.log(`Crawled: ${data.stats.crawled} URLs`);

    console.log(`Depth: ${data.stats.depth}`);

    console.log(`Speed: ${data.stats.speed.toFixed(1)} URLs/sec`);

    console.log(`Issues: ${data.issues.length}`);

    // Check if crawl is complete

    if (data.status === 'completed') {

      console.log('\n✓ Crawl completed!');

      isRunning = false;

      return data;

    }

    // Wait 1 second before next poll

    await new Promise(resolve => setTimeout(resolve, 1000));

  }

}

const results = await monitorCrawl();

Python Example

            
import time

import os

def monitor_crawl():

    is_running = True

    while is_running:

        response = session.get(f'{BASE_URL}/api/crawl_status')

        data = response.json()

        # Display progress

        os.system('clear' if os.name == 'posix' else 'cls')

        print('=== LibreCrawl Status ===')

        print(f'Status: {data["status"]}')

        print(f'Progress: {data["progress"]:.1f}%')

        print(f'Discovered: {data["stats"]["discovered"]} URLs')

        print(f'Crawled: {data["stats"]["crawled"]} URLs')

        print(f'Depth: {data["stats"]["depth"]}')

        print(f'Speed: {data["stats"]["speed"]:.1f} URLs/sec')

        print(f'Issues: {len(data["issues"])}')

        # Check if crawl is complete

        if data['status'] == 'completed':

            print('\n✓ Crawl completed!')

            is_running = False

            return data

        # Wait 1 second before next poll

        time.sleep(1)

results = monitor_crawl()

Best Practice: Implement error handling and exponential backoff in case of network errors during polling.

Step 6: Export Results

Once the crawl is complete, export the data in your preferred format:

JavaScript Example

            
async function exportResults(format = 'json') {

  const exportConfig = {

    format: format,

    fields: [

      'url', 'status_code', 'title', 'meta_description',

      'h1', 'word_count', 'response_time'

    ]

  };

  const response = await fetch('http://localhost:5000/api/export_data', {

    method: 'POST',

    headers: { 'Content-Type': 'application/json' },

    credentials: 'include',

    body: JSON.stringify(exportConfig)

  });

  const data = await response.json();

  if (data.success) {

    // Decode base64 content

    const content = atob(data.content);

    // Save to file (browser example)

    const blob = new Blob([content], { type: data.mimetype });

    const url = URL.createObjectURL(blob);

    const a = document.createElement('a');

    a.href = url;

    a.download = data.filename;

    a.click();

    console.log(`✓ Exported to ${data.filename}`);

  } else {

    console.error('✗ Export failed:', data.error);

  }

}

await exportResults('json');

Python Example

            
import base64

def export_results(format='json'):

    export_config = {

        'format': format,

        'fields': [

            'url', 'status_code', 'title', 'meta_description',

            'h1', 'word_count', 'response_time'

        ]

    }

    response = session.post(

        f'{BASE_URL}/api/export_data',

        json=export_config

    )

    data = response.json()

    if data['success']:

        # Decode base64 content

        content = base64.b64decode(data['content'])

        # Save to file

        with open(data['filename'], 'wb') as f:

            f.write(content)

        print(f'✓ Exported to {data["filename"]}')

    else:

        print(f'✗ Export failed: {data["error"]}')

export_results('json')

Complete Example Application

Here's a complete working example that ties everything together:

JavaScript (Node.js)

            
const fetch = require('node-fetch');

const fs = require('fs');

const BASE_URL = 'http://localhost:5000';

class LibreCrawlClient {

  constructor(baseUrl = BASE_URL) {

    this.baseUrl = baseUrl;

    this.cookies = {};

  }

  async request(endpoint, options = {}) {

    const response = await fetch(`${this.baseUrl}${endpoint}`, {

      ...options,

      credentials: 'include'

    });

    return response.json();

  }

  async login() {

    const data = await this.request('/api/guest-login', { method: 'POST' });

    if (!data.success) throw new Error(data.error);

    console.log('✓ Authenticated');

  }

  async configure(settings) {

    const data = await this.request('/api/save_settings', {

      method: 'POST',

      headers: { 'Content-Type': 'application/json' },

      body: JSON.stringify(settings)

    });

    if (!data.success) throw new Error(data.error);

    console.log('✓ Settings configured');

  }

  async startCrawl(url) {

    const data = await this.request('/api/start_crawl', {

      method: 'POST',

      headers: { 'Content-Type': 'application/json' },

      body: JSON.stringify({ url })

    });

    if (!data.success) throw new Error(data.error);

    console.log(`✓ Crawl started for ${url}`);

  }

  async getStatus() {

    return await this.request('/api/crawl_status');

  }

  async waitForCompletion() {

    while (true) {

      const status = await this.getStatus();

      console.log(`Progress: ${status.progress.toFixed(1)}% | ` +

                  `Crawled: ${status.stats.crawled} | ` +

                  `Issues: ${status.issues.length}`);

      if (status.status === 'completed') {

        console.log('✓ Crawl completed!');

        return status;

      }

      await new Promise(r => setTimeout(r, 1000));

    }

  }

  async export(format, fields) {

    const data = await this.request('/api/export_data', {

      method: 'POST',

      headers: { 'Content-Type': 'application/json' },

      body: JSON.stringify({ format, fields })

    });

    if (!data.success) throw new Error(data.error);

    const content = Buffer.from(data.content, 'base64');

    fs.writeFileSync(data.filename, content);

    console.log(`✓ Exported to ${data.filename}`);

  }

}

// Usage

(async () => {

  const client = new LibreCrawlClient();

  await client.login();

  await client.configure({ maxDepth: 3, maxUrls: 100 });

  await client.startCrawl('https://example.com');

  await client.waitForCompletion();

  await client.export('json', ['url', 'status_code', 'title']);

})();

🎉 Congratulations!

You've successfully built a complete LibreCrawl API application. You now know how to:

Authenticate with the API
Configure crawler settings
Start and monitor crawls
Export results in multiple formats

Next Steps

Explore Advanced Features

JavaScript Rendering: Enable enableJavaScript: true for React/Vue/Angular sites
Custom Filters: Use includePatterns and excludePatterns for precise crawling
Proxy Configuration: Set up proxyUrl for crawling from different IPs
PageSpeed Integration: Enable enablePageSpeed with your Google API key

Production Checklist

Use username/password authentication instead of guest login
Implement proper error handling and retry logic
Set appropriate crawlDelay to respect target servers
Configure respectRobotsTxt: true for ethical crawling
Monitor memory usage for large crawls
Set up HTTPS and secure cookies for production deployment

Learn More

Authentication API - Full authentication reference
Crawl Control API - Advanced crawl management
Settings API - Complete settings reference
Status API - Real-time data access
Export API - Data export options
API Overview - Complete API reference

Example Projects

SEO Audit Dashboard: Build a web dashboard that displays crawl results, issues, and visualizations
Automated Monitor: Schedule daily crawls and email reports when issues are detected
Content Inventory: Export all page titles, descriptions, and word counts for content audits
Link Checker: Find all broken links across a website
Site Structure Analyzer: Visualize site architecture using the visualization API

Get Help

GitHub Repository - Source code, issues, and discussions
Report Issues - Bug reports and feature requests
LibreCrawl Blog - Tutorials, comparisons, and guides