The Export API allows you to download crawl data in CSV, JSON, or XML formats with full control over which fields to include. The Filtering API helps you exclude false-positive issues based on URL patterns.

Endpoints

POST /api/export_data

Export crawl data in your choice of format with customizable field selection. Supports CSV, JSON, and XML formats with automatic handling of complex nested data.

Authentication

Requires valid session cookie.

Request Body

Parameter Type Required Description
format string Yes Export format: "csv", "json", or "xml"
fields array Yes Array of field names to include in export
localData object No Optional: Provide data to export (urls, links, issues arrays). If not provided, uses current crawl data.

Available Fields

  • URL Data: url, status_code, title, meta_description, h1, h2, h3, word_count, response_time, internal_links, external_links
  • Structured Data: analytics, og_tags, twitter_tags, json_ld, images
  • Special: issues_detected (list of issues for this URL), links_detailed (all links from this page)

Example Request (CSV)

curl -X POST http://localhost:5000/api/export_data \
  -H "Content-Type: application/json" \
  -b cookies.txt \
  -d '{
    "format": "csv",
    "fields": ["url", "status_code", "title", "meta_description", "word_count"]
  }' \
  -o export.csv

Example Request (JSON with all fields)

curl -X POST http://localhost:5000/api/export_data \
  -H "Content-Type: application/json" \
  -b cookies.txt \
  -d '{
    "format": "json",
    "fields": [
      "url", "status_code", "title", "meta_description",
      "h1", "h2", "h3", "word_count", "response_time",
      "analytics", "og_tags", "twitter_tags", "json_ld",
      "internal_links", "external_links", "images"
    ]
  }' \
  -o export.json

Example Request (Export with local data)

curl -X POST http://localhost:5000/api/export_data \
  -H "Content-Type: application/json" \
  -b cookies.txt \
  -d '{
    "format": "json",
    "fields": ["url", "status_code", "title"],
    "localData": {
      "urls": [
        {
          "url": "https://example.com",
          "status_code": 200,
          "title": "Example"
        }
      ],
      "links": [],
      "issues": []
    }
  }' \
  -o export.json

Success Response (200 OK) - Single File

{
  "success": true,
  "content": "base64-encoded-file-content",
  "mimetype": "text/csv",
  "filename": "librecrawl_export_20250118_143022.csv"
}

Success Response (200 OK) - Multiple Files

When exporting with "issues_detected" or "links_detailed" fields, multiple files may be returned:

{
  "success": true,
  "multiple_files": true,
  "files": [
    {
      "content": "base64-encoded-urls-file",
      "mimetype": "text/csv",
      "filename": "urls_20250118_143022.csv"
    },
    {
      "content": "base64-encoded-links-file",
      "mimetype": "text/csv",
      "filename": "links_20250118_143022.csv"
    },
    {
      "content": "base64-encoded-issues-file",
      "mimetype": "text/csv",
      "filename": "issues_20250118_143022.csv"
    }
  ]
}

Error Responses

# Missing required fields (400 Bad Request)
{
  "success": false,
  "error": "Missing required fields: format, fields"
}

# Invalid format (400 Bad Request)
{
  "success": false,
  "error": "Invalid format. Must be csv, json, or xml"
}

# No data to export (400 Bad Request)
{
  "success": false,
  "error": "No crawl data available to export"
}

Decoding Base64 Content

The response includes base64-encoded file content. Decode it before saving:

// JavaScript/Node.js example
const response = await fetch('/api/export_data', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ format: 'csv', fields: ['url', 'title'] })
});

const data = await response.json();

if (data.multiple_files) {
  // Multiple files
  data.files.forEach(file => {
    const content = atob(file.content); // Decode base64
    const blob = new Blob([content], { type: file.mimetype });
    saveFile(blob, file.filename);
  });
} else {
  // Single file
  const content = atob(data.content); // Decode base64
  const blob = new Blob([content], { type: data.mimetype });
  saveFile(blob, data.filename);
}
# Python example
import requests
import base64

response = requests.post(
    'http://localhost:5000/api/export_data',
    json={'format': 'csv', 'fields': ['url', 'title']},
    cookies=cookies
)

data = response.json()

if data.get('multiple_files'):
    for file in data['files']:
        content = base64.b64decode(file['content'])
        with open(file['filename'], 'wb') as f:
            f.write(content)
else:
    content = base64.b64decode(data['content'])
    with open(data['filename'], 'wb') as f:
        f.write(content)
POST /api/filter_issues

Filter a list of issues using the current user's exclusion patterns. Useful for removing false-positive issues from reports.

Authentication

Requires valid session cookie.

Request Body

Parameter Type Required Description
issues array Yes Array of issue objects to filter

Example Request

curl -X POST http://localhost:5000/api/filter_issues \
  -H "Content-Type: application/json" \
  -b cookies.txt \
  -d '{
    "issues": [
      {
        "url": "https://example.com/admin/page",
        "type": "404",
        "category": "broken_link",
        "issue": "Page not found",
        "details": "Returns 404"
      },
      {
        "url": "https://example.com/products/123",
        "type": "thin_content",
        "category": "seo",
        "issue": "Low word count",
        "details": "Only 50 words"
      }
    ]
  }'

Success Response (200 OK)

Returns filtered issues array (issues matching exclusion patterns are removed):

{
  "success": true,
  "issues": [
    {
      "url": "https://example.com/products/123",
      "type": "thin_content",
      "category": "seo",
      "issue": "Low word count",
      "details": "Only 50 words"
    }
  ]
}

Note: In this example, if the user's exclusion patterns included "*/admin/*", the 404 issue would be filtered out.

Exclusion Pattern Syntax

Exclusion patterns use wildcard matching (fnmatch):

  • * - Matches any sequence of characters
  • ? - Matches any single character
  • [abc] - Matches any character in brackets

Example Patterns

  • */admin/* - Exclude all URLs containing /admin/
  • */test-* - Exclude URLs with path segments starting with "test-"
  • */products/*/reviews - Exclude product review pages
  • */search?* - Exclude search result pages

Export Format Details

CSV Format

CSV exports flatten complex nested data:

  • Arrays: Converted to comma-separated strings (e.g., "heading1, heading2, heading3")
  • Objects: Converted to JSON strings
  • Images/Links: Count only (e.g., "5 images")
url,status_code,title,meta_description,h2
https://example.com,200,"Homepage","Welcome to Example","About Us, Services, Contact"
https://example.com/about,200,"About Us","Learn more about us","Our Story, Our Team"

JSON Format

JSON exports preserve full data structure:

{
  "urls": [
    {
      "url": "https://example.com",
      "status_code": 200,
      "title": "Homepage",
      "h2": ["About Us", "Services", "Contact"],
      "og_tags": {
        "og:title": "Homepage",
        "og:image": "https://example.com/img.jpg"
      }
    }
  ]
}

XML Format

XML exports use element-based hierarchy:

<?xml version="1.0" encoding="UTF-8"?>
<crawl_data>
  <urls>
    <url>
      <url>https://example.com</url>
      <status_code>200</status_code>
      <title>Homepage</title>
      <h2>
        <item>About Us</item>
        <item>Services</item>
      </h2>
    </url>
  </urls>
</crawl_data>

Common Export Scenarios

Basic SEO Audit Export

{
  "format": "csv",
  "fields": [
    "url", "status_code", "title", "meta_description",
    "h1", "word_count", "internal_links", "external_links"
  ]
}

Full Technical Audit Export

{
  "format": "json",
  "fields": [
    "url", "status_code", "title", "meta_description",
    "h1", "h2", "h3", "word_count", "response_time",
    "analytics", "og_tags", "twitter_tags", "json_ld",
    "internal_links", "external_links", "images",
    "issues_detected", "links_detailed"
  ]
}

Content Inventory Export

{
  "format": "csv",
  "fields": [
    "url", "title", "meta_description",
    "h1", "h2", "word_count"
  ]
}

Link Analysis Export

{
  "format": "csv",
  "fields": [
    "url", "status_code", "internal_links",
    "external_links", "links_detailed"
  ]
}

Best Practices

1. Choose the Right Format

  • CSV: Best for spreadsheet analysis, simple data structures, compatibility
  • JSON: Best for programmatic processing, complex nested data, API integrations
  • XML: Best for enterprise systems, XSLT transformations, legacy tool compatibility

2. Optimize Field Selection

Only export fields you need to reduce file size and processing time:

  • For quick audits: ["url", "status_code", "title"]
  • For content review: Add "meta_description", "h1", "word_count"
  • For technical SEO: Add "og_tags", "analytics", "json_ld"
  • For link analysis: Add "internal_links", "external_links", "links_detailed"

3. Handle Large Exports

For crawls with 100K+ URLs:

  • Use JSON format for better compression
  • Export incrementally using localData parameter
  • Consider filtering data before export
  • Stream large files instead of loading entirely in memory

4. Configure Issue Exclusions

Before exporting issues, configure exclusion patterns via Settings API to remove known false positives:

// Configure exclusions
await fetch('/api/save_settings', {
  method: 'POST',
  body: JSON.stringify({
    issueExclusionPatterns: '*/admin/*\n*/test-*\n*/preview/*'
  })
});

// Then export filtered issues
const crawlData = await fetch('/api/crawl_status').then(r => r.json());
const filtered = await fetch('/api/filter_issues', {
  method: 'POST',
  body: JSON.stringify({ issues: crawlData.issues })
}).then(r => r.json());

// Export only filtered issues
await fetch('/api/export_data', {
  method: 'POST',
  body: JSON.stringify({
    format: 'csv',
    fields: ['url', 'type', 'issue', 'details'],
    localData: { issues: filtered.issues }
  })
});

Next Steps