Content extraction API

Any URL in. Clean Markdown out.

Distil fetches a page, renders the JavaScript, strips nav, ads and chrome, and hands back the main content as tidy Markdown your model can actually read. One POST — structured, deterministic, ready for retrieval.

No credit card · 1,000 free pages · p50 1.4s per fetch

request.sh
$ curl https://api.distil.dev/v1/extract \
  -H "Authorization: Bearer $DISTIL_KEY" \
  -H "Content-Type: application/json" \
  -X POST -d '{
    "url": "https://example.com/blog/rag-tips",
    "format": "markdown",
    "main_only": true
  }'
response.json 200 OK
{
  "title": "Seven RAG tips",
  "word_count": 2184,
  "markdown": "# Seven RAG tips..."
}
Install $ npm i @distil/sdk
Powering retrieval at
  • Meridian AI
  • Halcyon Labs
  • Northwind
  • Corvus
  • Lumen Stack

Built for pipelines

The boilerplate problem, solved at the API.

You don’t want raw HTML in your context window. Distil returns the part that matters — and tells you exactly what it kept.

Typed SDK

Five lines to a clean document.

Point the client at a URL and stream structured Markdown straight into your splitter. Retries, render-wait and rate limits are handled for you.

ingest.ts
import { Distil } from "@distil/sdk";

const distil = new Distil(process.env.DISTIL_KEY);

const doc = await distil.extract({
  url: "https://example.com/docs",
  format: "markdown",
  render: "auto",     // wait for JS only if needed
});

index.add(doc.markdown);  // straight into your vector store

Main content, not the menu.

A readability model isolates the article body and drops nav, cookie banners, footers and related-post rails — so your embeddings learn signal, not chrome.

  • Headings, lists & tables preserved
  • Links and image alt-text kept inline
  • Scripts, styles & tracking removed

Deterministic & observable.

Same URL, same Markdown — every call returns a content hash, token count and the rules that fired, so a re-index is a diff, not a guessing game.

  • content_hash on every response
  • Token count for your budget
  • Webhooks for batch crawls

Quickstart

Live in three lines.

Install the SDK, point extract at a URL, and read clean Markdown back. No headless browser to babysit, no HTML to scrub — the happy path is the whole path.

Full SDK reference

quickstart.ts ready
import { extract } from "@distil/sdk";

const doc = await extract("https://example.com/blog/rag-tips");

console.log(doc.markdown);  // → "# Seven RAG tips\n\n## 1. Chunk on…"

Pricing

Pay for pages, not seats.

Metered per successful extraction. Failed fetches are never billed.

Free

$0

1,000 pages / month

  • Markdown & JSON output
  • Auto JS rendering
  • Community support
Start building
Most teams

Scale

$0.0008

per page · volume tiers

  • Everything in Free
  • Batch crawl & webhooks
  • 99.9% uptime SLA
  • Priority render queue
Get API key

Enterprise

Custom

dedicated & on-prem

  • Private rendering region
  • SSO, audit log, DPA
  • Solutions engineer
Talk to us