# arxiv2md.org > Clean, LLM-friendly Markdown versions of arXiv papers. Parses arXiv's structured HTML (not PDFs) for reliable sections, math (MathML → LaTeX), and tables. For programmatic / agent use, call the REST API below. No auth, no API key, no SDK — just a GET request. ## Quickstart ```bash # Raw markdown curl "https://arxiv2md.org/api/markdown?url=1706.03762" # JSON with metadata (title, arxiv_id, source_url, content) curl "https://arxiv2md.org/api/json?url=1706.03762" ``` `url` accepts a bare arXiv ID (`1706.03762`, `2501.11120v1`) or a full arXiv URL (`https://arxiv.org/abs/1706.03762`). ## Endpoints - `GET /api/markdown?url=` — returns raw Markdown as `text/plain`. - `GET /api/json?url=` — returns JSON: `{ "arxiv_id", "title", "source_url", "content" }`. - `GET /api` — OpenAPI schema (JSON). - `GET /health` — `{ "status": "healthy" }`. ### Query parameters | Param | Default | Applies to | Description | |-------|---------|------------|-------------| | `url` | required | both | arXiv ID or URL | | `remove_refs` | `true` | both | Drop bibliography/references section | | `remove_toc` | `true` | both | Drop table of contents | | `remove_citations` | `true` | both | Strip inline citations (e.g. "(Smith et al., 2023)") | | `frontmatter` | `false` | `/api/markdown` only | Prepend YAML frontmatter with paper metadata | ## Examples ```bash # Keep references and citations curl "https://arxiv2md.org/api/markdown?url=2312.00752&remove_refs=false&remove_citations=false" # Markdown with YAML frontmatter, piped into an LLM curl -s "https://arxiv2md.org/api/markdown?url=2501.11120&frontmatter=true" | your-llm ``` ## Notes - **The URL-swap trick returns HTML, not Markdown.** Visiting `https://arxiv2md.org/abs/1706.03762` (i.e. replacing `arxiv.org` with `arxiv2md.org`) loads the human web app with the URL pre-filled. Agents should use `/api/markdown` or `/api/json` instead. - Works for arXiv papers that have a structured HTML version (most newer papers). - Rate limit: 30 requests/minute per IP. - Results are cached server-side for 24 hours, so repeated requests for the same paper are fast. - Errors return HTTP 400 (invalid URL / processing error) or 500, with an `error` message. ## CLI & Python library For local use, install the package (PyPI name `arxiv2markdown`, import name `arxiv2md`): ```bash pip install arxiv2markdown # CLI: write markdown to stdout arxiv2md 2501.11120v1 --remove-refs --remove-toc -o - # Only specific sections arxiv2md 2501.11120v1 --section-filter-mode include --sections "Abstract,Introduction" -o - ``` ```python from arxiv2md import ingest_paper_sync # or: ingest_paper (async) result = ingest_paper_sync("2501.11120v1") # kwargs: remove_refs, remove_toc, print(result.content) # remove_inline_citations, section_filter_mode, # sections, include_frontmatter ``` ## Links - Web app: https://arxiv2md.org - Source: https://github.com/timf34/arxiv2md - PyPI: https://pypi.org/project/arxiv2markdown/