Crawleo Documentation

Overview

Crawleo provides multiple output formats optimized for different use cases. Choose the format that best fits your application’s needs.

Available Formats

Raw HTML

Original page source with all elements intact.

AI-Enhanced HTML

Clean HTML with ads, scripts, and tracking removed.

Plain Text

Extracted text content without HTML markup.

Markdown

Structured Markdown optimized for LLMs.

Format Details

Raw HTML

Returns the complete, unmodified HTML source of the page. Parameter: output_format=raw_html (Crawler API) or get_raw_html=true (Search API) Best for:

Full page preservation
Custom parsing and extraction
When you need exact page structure

Example output:

<!DOCTYPE html>
<html>
<head>
  <title>Example Page</title>
  <script src="analytics.js"></script>
</head>
<body>
  <nav>...</nav>
  <main>
    <h1>Welcome</h1>
    <p>Content here...</p>
  </main>
  <footer>...</footer>
</body>
</html>

AI-Enhanced HTML

Returns cleaned HTML with ads, tracking scripts, and unnecessary elements removed. Parameter: get_ai_enhanced_html=true (Search API) Best for:

Cleaner content processing
Reduced noise in extraction
When you want HTML structure without clutter

Example output:

<main>
  <h1>Welcome</h1>
  <p>Content here...</p>
</main>

Plain Text

Returns extracted text content without any HTML markup. Parameter: get_page_text=true (Search API) Best for:

Simple text processing
Search and analysis tasks
When HTML structure isn’t needed

Example output:

Welcome

Content here...

Markdown

Returns content converted to structured Markdown format. Parameter: output_format=markdown (Crawler API) or get_page_text_markdown=true (Search API) Best for:

RAG pipelines
LLM consumption
Vector database ingestion
Minimal token usage

Example output:

# Welcome

Content here...

## Section Title

More content with [links](https://example.com) and **formatting**.

Format Comparison

Format	Token Usage	Structure	Noise Level	Best For
Raw HTML	High	Full	High	Custom parsing
AI-Enhanced HTML	Medium	Partial	Low	Clean extraction
Plain Text	Low	None	Low	Simple processing
Markdown	Low	Preserved	Minimal	LLM/RAG

Recommendations by Use Case

RAG Pipeline

Use Markdown format for optimal results:

Preserves document structure (headers, lists, links)
Minimal token usage
Clean content ready for embedding

response = requests.get(
    "https://api.crawleo.dev/crawl",
    params={"urls": url, "output_format": "markdown"},
    headers={"Authorization": f"Bearer {api_key}"}
)

LLM Context

Use Markdown or Plain Text:

Markdown if structure matters
Plain text for maximum token efficiency

# For structured content
params = {"query": query, "get_page_text_markdown": True}

# For simple text
params = {"query": query, "get_page_text": True}

Web Scraping

Use Raw HTML or AI-Enhanced HTML:

Raw HTML for full control
AI-Enhanced HTML for cleaner starting point

Content Analysis

Use Plain Text or Markdown:

Easy to process with NLP tools
No HTML parsing required

Multiple Formats

You can request multiple formats in a single request:

curl -X GET "https://api.crawleo.dev/search?query=example&get_raw_html=true&get_page_text_markdown=true" \
  -H "Authorization: Bearer YOUR_API_KEY"

The response will include both formats for each result.

Pro tip: Start with Markdown for most AI applications. It provides the best balance of structure and token efficiency.

Getting Started

Core Concepts

MCP Integration

Integrations

Output Formats

Overview

Available Formats

Raw HTML

AI-Enhanced HTML

Plain Text

Markdown

Format Details

Raw HTML

AI-Enhanced HTML

Plain Text

Markdown

Format Comparison

Recommendations by Use Case

Multiple Formats

Getting Started

Core Concepts

MCP Integration

Integrations

​Overview

​Available Formats

Raw HTML

AI-Enhanced HTML

Plain Text

Markdown

​Format Details

​Raw HTML

​AI-Enhanced HTML

​Plain Text

​Markdown

​Format Comparison

​Recommendations by Use Case

​Multiple Formats

Overview

Available Formats

Format Details

Raw HTML

AI-Enhanced HTML

Plain Text

Markdown

Format Comparison

Recommendations by Use Case

Multiple Formats