The Export API allows you to download crawl data in CSV, JSON, or XML formats with full control over which fields to include. The Filtering API helps you exclude false-positive issues based on URL patterns.
Endpoints
Export crawl data in your choice of format with customizable field selection. Supports CSV, JSON, and XML formats with automatic handling of complex nested data.
Authentication
Requires valid session cookie.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| format | string | Yes | Export format: "csv", "json", or "xml" |
| fields | array | Yes | Array of field names to include in export |
| localData | object | No | Optional: Provide data to export (urls, links, issues arrays). If not provided, uses current crawl data. |
Available Fields
- URL Data: url, status_code, title, meta_description, h1, h2, h3, word_count, response_time, internal_links, external_links
- Structured Data: analytics, og_tags, twitter_tags, json_ld, images
- Special: issues_detected (list of issues for this URL), links_detailed (all links from this page)
Example Request (CSV)
curl -X POST http://localhost:5000/api/export_data \
-H "Content-Type: application/json" \
-b cookies.txt \
-d '{
"format": "csv",
"fields": ["url", "status_code", "title", "meta_description", "word_count"]
}' \
-o export.csv
Example Request (JSON with all fields)
curl -X POST http://localhost:5000/api/export_data \
-H "Content-Type: application/json" \
-b cookies.txt \
-d '{
"format": "json",
"fields": [
"url", "status_code", "title", "meta_description",
"h1", "h2", "h3", "word_count", "response_time",
"analytics", "og_tags", "twitter_tags", "json_ld",
"internal_links", "external_links", "images"
]
}' \
-o export.json
Example Request (Export with local data)
curl -X POST http://localhost:5000/api/export_data \
-H "Content-Type: application/json" \
-b cookies.txt \
-d '{
"format": "json",
"fields": ["url", "status_code", "title"],
"localData": {
"urls": [
{
"url": "https://example.com",
"status_code": 200,
"title": "Example"
}
],
"links": [],
"issues": []
}
}' \
-o export.json
Success Response (200 OK) - Single File
{
"success": true,
"content": "base64-encoded-file-content",
"mimetype": "text/csv",
"filename": "librecrawl_export_20250118_143022.csv"
}
Success Response (200 OK) - Multiple Files
When exporting with "issues_detected" or "links_detailed" fields, multiple files may be returned:
{
"success": true,
"multiple_files": true,
"files": [
{
"content": "base64-encoded-urls-file",
"mimetype": "text/csv",
"filename": "urls_20250118_143022.csv"
},
{
"content": "base64-encoded-links-file",
"mimetype": "text/csv",
"filename": "links_20250118_143022.csv"
},
{
"content": "base64-encoded-issues-file",
"mimetype": "text/csv",
"filename": "issues_20250118_143022.csv"
}
]
}
Error Responses
# Missing required fields (400 Bad Request)
{
"success": false,
"error": "Missing required fields: format, fields"
}
# Invalid format (400 Bad Request)
{
"success": false,
"error": "Invalid format. Must be csv, json, or xml"
}
# No data to export (400 Bad Request)
{
"success": false,
"error": "No crawl data available to export"
}
Decoding Base64 Content
The response includes base64-encoded file content. Decode it before saving:
// JavaScript/Node.js example
const response = await fetch('/api/export_data', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ format: 'csv', fields: ['url', 'title'] })
});
const data = await response.json();
if (data.multiple_files) {
// Multiple files
data.files.forEach(file => {
const content = atob(file.content); // Decode base64
const blob = new Blob([content], { type: file.mimetype });
saveFile(blob, file.filename);
});
} else {
// Single file
const content = atob(data.content); // Decode base64
const blob = new Blob([content], { type: data.mimetype });
saveFile(blob, data.filename);
}
# Python example
import requests
import base64
response = requests.post(
'http://localhost:5000/api/export_data',
json={'format': 'csv', 'fields': ['url', 'title']},
cookies=cookies
)
data = response.json()
if data.get('multiple_files'):
for file in data['files']:
content = base64.b64decode(file['content'])
with open(file['filename'], 'wb') as f:
f.write(content)
else:
content = base64.b64decode(data['content'])
with open(data['filename'], 'wb') as f:
f.write(content)
Filter a list of issues using the current user's exclusion patterns. Useful for removing false-positive issues from reports.
Authentication
Requires valid session cookie.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| issues | array | Yes | Array of issue objects to filter |
Example Request
curl -X POST http://localhost:5000/api/filter_issues \
-H "Content-Type: application/json" \
-b cookies.txt \
-d '{
"issues": [
{
"url": "https://example.com/admin/page",
"type": "404",
"category": "broken_link",
"issue": "Page not found",
"details": "Returns 404"
},
{
"url": "https://example.com/products/123",
"type": "thin_content",
"category": "seo",
"issue": "Low word count",
"details": "Only 50 words"
}
]
}'
Success Response (200 OK)
Returns filtered issues array (issues matching exclusion patterns are removed):
{
"success": true,
"issues": [
{
"url": "https://example.com/products/123",
"type": "thin_content",
"category": "seo",
"issue": "Low word count",
"details": "Only 50 words"
}
]
}
Note: In this example, if the user's exclusion patterns included "*/admin/*", the 404 issue would be filtered out.
Exclusion Pattern Syntax
Exclusion patterns use wildcard matching (fnmatch):
*- Matches any sequence of characters?- Matches any single character[abc]- Matches any character in brackets
Example Patterns
*/admin/*- Exclude all URLs containing /admin/*/test-*- Exclude URLs with path segments starting with "test-"*/products/*/reviews- Exclude product review pages*/search?*- Exclude search result pages
Export Format Details
CSV Format
CSV exports flatten complex nested data:
- Arrays: Converted to comma-separated strings (e.g., "heading1, heading2, heading3")
- Objects: Converted to JSON strings
- Images/Links: Count only (e.g., "5 images")
url,status_code,title,meta_description,h2
https://example.com,200,"Homepage","Welcome to Example","About Us, Services, Contact"
https://example.com/about,200,"About Us","Learn more about us","Our Story, Our Team"
JSON Format
JSON exports preserve full data structure:
{
"urls": [
{
"url": "https://example.com",
"status_code": 200,
"title": "Homepage",
"h2": ["About Us", "Services", "Contact"],
"og_tags": {
"og:title": "Homepage",
"og:image": "https://example.com/img.jpg"
}
}
]
}
XML Format
XML exports use element-based hierarchy:
<?xml version="1.0" encoding="UTF-8"?>
<crawl_data>
<urls>
<url>
<url>https://example.com</url>
<status_code>200</status_code>
<title>Homepage</title>
<h2>
<item>About Us</item>
<item>Services</item>
</h2>
</url>
</urls>
</crawl_data>
Common Export Scenarios
Basic SEO Audit Export
{
"format": "csv",
"fields": [
"url", "status_code", "title", "meta_description",
"h1", "word_count", "internal_links", "external_links"
]
}
Full Technical Audit Export
{
"format": "json",
"fields": [
"url", "status_code", "title", "meta_description",
"h1", "h2", "h3", "word_count", "response_time",
"analytics", "og_tags", "twitter_tags", "json_ld",
"internal_links", "external_links", "images",
"issues_detected", "links_detailed"
]
}
Content Inventory Export
{
"format": "csv",
"fields": [
"url", "title", "meta_description",
"h1", "h2", "word_count"
]
}
Link Analysis Export
{
"format": "csv",
"fields": [
"url", "status_code", "internal_links",
"external_links", "links_detailed"
]
}
Best Practices
1. Choose the Right Format
- CSV: Best for spreadsheet analysis, simple data structures, compatibility
- JSON: Best for programmatic processing, complex nested data, API integrations
- XML: Best for enterprise systems, XSLT transformations, legacy tool compatibility
2. Optimize Field Selection
Only export fields you need to reduce file size and processing time:
- For quick audits: ["url", "status_code", "title"]
- For content review: Add "meta_description", "h1", "word_count"
- For technical SEO: Add "og_tags", "analytics", "json_ld"
- For link analysis: Add "internal_links", "external_links", "links_detailed"
3. Handle Large Exports
For crawls with 100K+ URLs:
- Use JSON format for better compression
- Export incrementally using localData parameter
- Consider filtering data before export
- Stream large files instead of loading entirely in memory
4. Configure Issue Exclusions
Before exporting issues, configure exclusion patterns via Settings API to remove known false positives:
// Configure exclusions
await fetch('/api/save_settings', {
method: 'POST',
body: JSON.stringify({
issueExclusionPatterns: '*/admin/*\n*/test-*\n*/preview/*'
})
});
// Then export filtered issues
const crawlData = await fetch('/api/crawl_status').then(r => r.json());
const filtered = await fetch('/api/filter_issues', {
method: 'POST',
body: JSON.stringify({ issues: crawlData.issues })
}).then(r => r.json());
// Export only filtered issues
await fetch('/api/export_data', {
method: 'POST',
body: JSON.stringify({
format: 'csv',
fields: ['url', 'type', 'issue', 'details'],
localData: { issues: filtered.issues }
})
});
Next Steps
- Getting Started Guide - Build a complete crawl and export workflow
- Status API - Get data to export
- Settings API - Configure export defaults