Export & Filtering API - LibreCrawl API Documentation

The Export API allows you to download crawl data in CSV, JSON, or XML formats with full control over which fields to include. The Filtering API helps you exclude false-positive issues based on URL patterns.

Endpoints

POST /api/export_data

Export crawl data in your choice of format with customizable field selection. Supports CSV, JSON, and XML formats with automatic handling of complex nested data.

Authentication

Requires valid session cookie.

Request Body

Parameter	Type	Required	Description
format	string	Yes	Export format: "csv", "json", or "xml"
fields	array	Yes	Array of field names to include in export
localData	object	No	Optional: Provide data to export (urls, links, issues arrays). If not provided, uses current crawl data.

Available Fields

URL Data: url, status_code, title, meta_description, h1, h2, h3, word_count, response_time, internal_links, external_links
Structured Data: analytics, og_tags, twitter_tags, json_ld, images
Special: issues_detected (list of issues for this URL), links_detailed (all links from this page)

Example Request (CSV)

            
curl -X POST http://localhost:5000/api/export_data \

  -H "Content-Type: application/json" \

  -b cookies.txt \

  -d '{

    "format": "csv",

    "fields": ["url", "status_code", "title", "meta_description", "word_count"]

  }' \

  -o export.csv

Example Request (JSON with all fields)

            
curl -X POST http://localhost:5000/api/export_data \

  -H "Content-Type: application/json" \

  -b cookies.txt \

  -d '{

    "format": "json",

    "fields": [

      "url", "status_code", "title", "meta_description",

      "h1", "h2", "h3", "word_count", "response_time",

      "analytics", "og_tags", "twitter_tags", "json_ld",

      "internal_links", "external_links", "images"

    ]

  }' \

  -o export.json

Example Request (Export with local data)

            
curl -X POST http://localhost:5000/api/export_data \

  -H "Content-Type: application/json" \

  -b cookies.txt \

  -d '{

    "format": "json",

    "fields": ["url", "status_code", "title"],

    "localData": {

      "urls": [

        {

          "url": "https://example.com",

          "status_code": 200,

          "title": "Example"

        }

      ],

      "links": [],

      "issues": []

    }

  }' \

  -o export.json

Success Response (200 OK) - Single File

            
{

  "success": true,

  "content": "base64-encoded-file-content",

  "mimetype": "text/csv",

  "filename": "librecrawl_export_20250118_143022.csv"

}

Success Response (200 OK) - Multiple Files

When exporting with "issues_detected" or "links_detailed" fields, multiple files may be returned:

            
{

  "success": true,

  "multiple_files": true,

  "files": [

    {

      "content": "base64-encoded-urls-file",

      "mimetype": "text/csv",

      "filename": "urls_20250118_143022.csv"

    },

    {

      "content": "base64-encoded-links-file",

      "mimetype": "text/csv",

      "filename": "links_20250118_143022.csv"

    },

    {

      "content": "base64-encoded-issues-file",

      "mimetype": "text/csv",

      "filename": "issues_20250118_143022.csv"

    }

  ]

}

Error Responses

            
# Missing required fields (400 Bad Request)

{

  "success": false,

  "error": "Missing required fields: format, fields"

}

# Invalid format (400 Bad Request)

{

  "success": false,

  "error": "Invalid format. Must be csv, json, or xml"

}

# No data to export (400 Bad Request)

{

  "success": false,

  "error": "No crawl data available to export"

}

Decoding Base64 Content

The response includes base64-encoded file content. Decode it before saving:

            
// JavaScript/Node.js example

const response = await fetch('/api/export_data', {

  method: 'POST',

  headers: { 'Content-Type': 'application/json' },

  body: JSON.stringify({ format: 'csv', fields: ['url', 'title'] })

});

const data = await response.json();

if (data.multiple_files) {

  // Multiple files

  data.files.forEach(file => {

    const content = atob(file.content);  // Decode base64

    const blob = new Blob([content], { type: file.mimetype });

    saveFile(blob, file.filename);

  });

} else {

  // Single file

  const content = atob(data.content);  // Decode base64

  const blob = new Blob([content], { type: data.mimetype });

  saveFile(blob, data.filename);

}

            
# Python example

import requests

import base64

response = requests.post(

    'http://localhost:5000/api/export_data',

    json={'format': 'csv', 'fields': ['url', 'title']},

    cookies=cookies

)

data = response.json()

if data.get('multiple_files'):

    for file in data['files']:

        content = base64.b64decode(file['content'])

        with open(file['filename'], 'wb') as f:

            f.write(content)

else:

    content = base64.b64decode(data['content'])

    with open(data['filename'], 'wb') as f:

        f.write(content)

POST /api/filter_issues

Filter a list of issues using the current user's exclusion patterns. Useful for removing false-positive issues from reports.

Authentication

Requires valid session cookie.

Request Body

Parameter	Type	Required	Description
issues	array	Yes	Array of issue objects to filter

Example Request

            
curl -X POST http://localhost:5000/api/filter_issues \

  -H "Content-Type: application/json" \

  -b cookies.txt \

  -d '{

    "issues": [

      {

        "url": "https://example.com/admin/page",

        "type": "404",

        "category": "broken_link",

        "issue": "Page not found",

        "details": "Returns 404"

      },

      {

        "url": "https://example.com/products/123",

        "type": "thin_content",

        "category": "seo",

        "issue": "Low word count",

        "details": "Only 50 words"

      }

    ]

  }'

Success Response (200 OK)

Returns filtered issues array (issues matching exclusion patterns are removed):

            
{

  "success": true,

  "issues": [

    {

      "url": "https://example.com/products/123",

      "type": "thin_content",

      "category": "seo",

      "issue": "Low word count",

      "details": "Only 50 words"

    }

  ]

}

Note: In this example, if the user's exclusion patterns included "*/admin/*", the 404 issue would be filtered out.

Exclusion Pattern Syntax

Exclusion patterns use wildcard matching (fnmatch):

* - Matches any sequence of characters
? - Matches any single character
[abc] - Matches any character in brackets

Example Patterns

*/admin/* - Exclude all URLs containing /admin/
*/test-* - Exclude URLs with path segments starting with "test-"
*/products/*/reviews - Exclude product review pages
*/search?* - Exclude search result pages

Export Format Details

CSV Format

CSV exports flatten complex nested data:

Arrays: Converted to comma-separated strings (e.g., "heading1, heading2, heading3")
Objects: Converted to JSON strings
Images/Links: Count only (e.g., "5 images")

          
url,status_code,title,meta_description,h2

https://example.com,200,"Homepage","Welcome to Example","About Us, Services, Contact"

https://example.com/about,200,"About Us","Learn more about us","Our Story, Our Team"

JSON Format

JSON exports preserve full data structure:

          
{

  "urls": [

    {

      "url": "https://example.com",

      "status_code": 200,

      "title": "Homepage",

      "h2": ["About Us", "Services", "Contact"],

      "og_tags": {

        "og:title": "Homepage",

        "og:image": "https://example.com/img.jpg"

      }

    }

  ]

}

XML Format

XML exports use element-based hierarchy:

          
<?xml version="1.0" encoding="UTF-8"?>

<crawl_data>

  <urls>

    <url>

      <url>https://example.com</url>

      <status_code>200</status_code>

      <title>Homepage</title>

      <h2>

        <item>About Us</item>

        <item>Services</item>

      </h2>

    </url>

  </urls>

</crawl_data>

Common Export Scenarios

Basic SEO Audit Export

          
{

  "format": "csv",

  "fields": [

    "url", "status_code", "title", "meta_description",

    "h1", "word_count", "internal_links", "external_links"

  ]

}

Full Technical Audit Export

          
{

  "format": "json",

  "fields": [

    "url", "status_code", "title", "meta_description",

    "h1", "h2", "h3", "word_count", "response_time",

    "analytics", "og_tags", "twitter_tags", "json_ld",

    "internal_links", "external_links", "images",

    "issues_detected", "links_detailed"

  ]

}

Content Inventory Export

          
{

  "format": "csv",

  "fields": [

    "url", "title", "meta_description",

    "h1", "h2", "word_count"

  ]

}

Link Analysis Export

          
{

  "format": "csv",

  "fields": [

    "url", "status_code", "internal_links",

    "external_links", "links_detailed"

  ]

}

Best Practices

1. Choose the Right Format

CSV: Best for spreadsheet analysis, simple data structures, compatibility
JSON: Best for programmatic processing, complex nested data, API integrations
XML: Best for enterprise systems, XSLT transformations, legacy tool compatibility

2. Optimize Field Selection

Only export fields you need to reduce file size and processing time:

For quick audits: ["url", "status_code", "title"]
For content review: Add "meta_description", "h1", "word_count"
For technical SEO: Add "og_tags", "analytics", "json_ld"
For link analysis: Add "internal_links", "external_links", "links_detailed"

3. Handle Large Exports

For crawls with 100K+ URLs:

Use JSON format for better compression
Export incrementally using localData parameter
Consider filtering data before export
Stream large files instead of loading entirely in memory

4. Configure Issue Exclusions

Before exporting issues, configure exclusion patterns via Settings API to remove known false positives:

          
// Configure exclusions

await fetch('/api/save_settings', {

  method: 'POST',

  body: JSON.stringify({

    issueExclusionPatterns: '*/admin/*\n*/test-*\n*/preview/*'

  })

});

// Then export filtered issues

const crawlData = await fetch('/api/crawl_status').then(r => r.json());

const filtered = await fetch('/api/filter_issues', {

  method: 'POST',

  body: JSON.stringify({ issues: crawlData.issues })

}).then(r => r.json());

// Export only filtered issues

await fetch('/api/export_data', {

  method: 'POST',

  body: JSON.stringify({

    format: 'csv',

    fields: ['url', 'type', 'issue', 'details'],

    localData: { issues: filtered.issues }

  })

});

Next Steps

Getting Started Guide - Build a complete crawl and export workflow
Status API - Get data to export
Settings API - Configure export defaults