Overview
Crawleo provides multiple output formats optimized for different use cases. Choose the format that best fits your application’s needs.Available Formats
Raw HTML
Original page source with all elements intact.
AI-Enhanced HTML
Clean HTML with ads, scripts, and tracking removed.
Plain Text
Extracted text content without HTML markup.
Markdown
Structured Markdown optimized for LLMs.
Format Details
Raw HTML
Returns the complete, unmodified HTML source of the page. Parameter:raw_html=true (Crawler API) or get_raw_html=true (Search API)
Best for:
- Full page preservation
- Custom parsing and extraction
- When you need exact page structure
AI-Enhanced HTML
Returns cleaned HTML with ads, tracking scripts, and unnecessary elements removed. Parameter:get_ai_enhanced_html=true (Search API)
Best for:
- Cleaner content processing
- Reduced noise in extraction
- When you want HTML structure without clutter
Plain Text
Returns extracted text content without any HTML markup. Parameter:get_page_text=true (Search API)
Best for:
- Simple text processing
- Search and analysis tasks
- When HTML structure isn’t needed
Markdown
Returns content converted to structured Markdown format. Parameter:markdown=true (Crawler API) or get_page_text_markdown=true (Search API)
Best for:
- RAG pipelines
- LLM consumption
- Vector database ingestion
- Minimal token usage
Format Comparison
| Format | Token Usage | Structure | Noise Level | Best For |
|---|---|---|---|---|
| Raw HTML | High | Full | High | Custom parsing |
| AI-Enhanced HTML | Medium | Partial | Low | Clean extraction |
| Plain Text | Low | None | Low | Simple processing |
| Markdown | Low | Preserved | Minimal | LLM/RAG |
Recommendations by Use Case
RAG Pipeline
RAG Pipeline
Use Markdown format for optimal results:
- Preserves document structure (headers, lists, links)
- Minimal token usage
- Clean content ready for embedding
LLM Context
LLM Context
Use Markdown or Plain Text:
- Markdown if structure matters
- Plain text for maximum token efficiency
Web Scraping
Web Scraping
Use Raw HTML or AI-Enhanced HTML:
- Raw HTML for full control
- AI-Enhanced HTML for cleaner starting point
Content Analysis
Content Analysis
Use Plain Text or Markdown:
- Easy to process with NLP tools
- No HTML parsing required
