Skip to main content
GET
/
crawl
Crawler API
curl --request GET \
  --url https://api.example.com/crawl
{
  "results": [
    {
      "url": "<string>",
      "status": 123,
      "raw_html": "<string>",
      "markdown": "<string>",
      "error": "<string>"
    }
  ],
  "credits_used": 123,
  "credits_remaining": 123
}

Overview

The Crawler API performs direct crawling of specified URLs with JavaScript rendering support. Ideal for extracting content from single pages or multiple URLs in a single request.

Endpoint

GET https://api.crawleo.dev/crawl

Parameters

Required Parameters

urls
string
required
URL(s) to crawl. Can be a single URL or comma-separated list.Example: https://example.com or https://example.com,https://example.org

Output Format Parameters

output_format
string
default:"raw_html"
Output format for crawled content. Options:
  • raw_html - Original HTML source
  • enhanced_html - Clean HTML with ads, scripts, and tracking removed
  • markdown - Structured Markdown (recommended for RAG/LLM)

Localization Parameters

country
string
default:"us"
2-letter country code for geo-targeted crawling (e.g., us, gb, de).

Advanced Options

screenshot
boolean
default:"false"
Capture a screenshot of the page.
use_proxies
boolean
default:"false"
Use datacenter proxies for the request. Costs 2 credits per page (vs 1 credit without proxy).
use_premium_proxies
boolean
default:"false"
Use residential proxies for the request (higher success rate for protected sites). Costs 10 credits per page.
use_proxies and use_premium_proxies are mutually exclusive. Only one can be set to true at a time.

Example Requests

Basic Crawl with Markdown Output

curl -X GET "https://api.crawleo.dev/crawl?urls=https://example.com&output_format=markdown" \
  -H "Authorization: Bearer YOUR_API_KEY"

Crawl Multiple URLs

curl -X GET "https://api.crawleo.dev/crawl?urls=https://example.com,https://example.org&output_format=markdown" \
  -H "Authorization: Bearer YOUR_API_KEY"

Get Raw HTML Only

curl -X GET "https://api.crawleo.dev/crawl?urls=https://example.com&output_format=raw_html" \
  -H "Authorization: Bearer YOUR_API_KEY"

Crawl with Proxy Support

curl -X GET "https://api.crawleo.dev/crawl?urls=https://example.com&output_format=markdown&use_premium_proxies=true&country=gb" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response

A successful response returns crawled content for each URL:
{
  "success": true,
  "results": [
    {
      "url": "https://example.com",
      "status": 200,
      "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
      "raw_html": "<!DOCTYPE html><html>...</html>"
    }
  ],
  "credits_used": 1,
  "credits_remaining": 9999
}
results
array
Array of crawl result objects.
credits_used
integer
Number of credits consumed by this request. Varies based on proxy settings:
  • No proxy: 1 credit per URL
  • Standard proxy: 2 credits per URL
  • Premium proxy: 10 credits per URL
credits_remaining
integer
Your remaining credit balance for the current billing period.

Use Cases

Crawl documentation pages or knowledge bases and convert to Markdown for vector database ingestion.
# Example: Crawl docs for RAG
response = requests.get(
    "https://api.crawleo.dev/crawl",
    params={
        "urls": "https://docs.example.com/guide,https://docs.example.com/api",
        "output_format": "markdown"
    },
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

for result in response.json()["results"]:
    # Add to vector database
    vector_db.add(result["markdown"], metadata={"url": result["url"]})
Extract clean content from web pages for analysis or processing.
Scrape multiple pages in a single API call with JavaScript rendering support.
Provide AI agents with the ability to read and understand web pages.

Tips

For LLM applications, always use output_format=markdown to get clean, structured content that minimizes token usage.
Ensure you have permission to crawl the target URLs. Respect robots.txt and website terms of service.
Last modified on January 27, 2026