# HYFL RAG Free retrieval API over a hosted Wikipedia-derived medical and science corpus. Built for agents and RAG pipelines that need source snippets before generation. - Site: https://hyfl.uk/ - Human docs: https://hyfl.uk/docs - API status: https://hyfl.uk/status - Created by Bilawal Riaz: https://bilawal.net - Base URL: https://hyfl.uk/rag/v1 - OpenAPI spec: https://hyfl.uk/rag/v1/openapi.json - Swagger UI: https://hyfl.uk/rag/v1/docs - MCP endpoint: https://hyfl.uk/rag/mcp - Full agent docs: https://hyfl.uk/llms-full.txt ## Get a key Generate a free API key with one POST. No auth required. ```http POST https://hyfl.uk/rag/v1/keys/anonymous Content-Type: application/json { "name": "my-agent" } ``` Response: ```json { "prefix": "sk_live_AbCdEfGhIj", "secret": "sk_live_AbCdEfGhIjKlMnOpQrStUvWxYz123456", "scopes": ["retrieve"], "expires_at": "2026-09-20T12:00:00+00:00", "budget": { "rpm_limit": 60, "hourly_limit": 1000, "daily_limit": 24000, "max_top_k": 25, "max_context_chars": 50000, "max_request_bytes": 1000000 } } ``` Save the `secret` immediately. It is shown only once. Free key creation is lightly limited by source network: 5 keys per 10 minutes and 20 keys per day. ## Retrieve context ```http POST https://hyfl.uk/rag/v1/retrieve Authorization: Bearer sk_live_... Content-Type: application/json { "query": "What is hypertension?", "top_k": 5, "max_context_chars": 6000 } ``` Optional fields: `messages`, `conversation`, `corpus`, `include_trace`. Response shape: ```json { "query": "What is hypertension?", "contextualized_query": "hypertension", "results": [ { "id": "medical_core:chunk:12345", "title": "Hypertension", "heading": "Signs and symptoms", "url": "https://en.wikipedia.org/wiki/Hypertension", "score": 0.9182, "text": "Hypertension is a long-term medical condition...", "trace": { "corpus": "medical_core", "retriever": "vector", "chunk_index": 4, "start_char": 1200, "end_char": 1850 } } ], "latency_ms": 128, "conversation": {"mode": "direct"} } ``` ## Public endpoints - `GET /rag/v1/health` - JSON liveness, no auth - `GET /rag/v1/ready` - JSON readiness, no auth - `GET /rag/v1/corpora` - available hosted corpora, no auth - `GET /status` - human-readable API status page - `POST /rag/v1/keys/anonymous` - mint a free key, no auth - `POST /rag/v1/retrieve` - retrieve source snippets, bearer key required - `POST /rag/v1/retrieve/stream` - Server-Sent Events retrieval, bearer key required - `GET /rag/v1/usage` - key usage, bearer key required - `POST /rag/mcp` - Model Context Protocol endpoint, bearer key required ## Rate limits - 60 requests/minute - 1,000 requests/hour - 24,000 requests/day - `top_k` up to 25 - `max_context_chars` up to 50,000 - 1 MB request body - 90-day free key expiry ## Agent notes - The hosted public corpus is `medical_core`. - The hosted corpus is Wikipedia-derived. Treat returned text as background context, not clinical guidance. - Do not use this service to diagnose, treat, or make decisions about real patients. - The service returns source snippets only. The client LLM writes the final answer from returned snippets. - Use `title`, `heading`, `url`, and `text` as grounding context. - Error responses are JSON: `{"detail": "..."}`. - CORS preflight is supported for browser clients.