HYFL RAG API Docs

HYFL RAG is a hosted retrieval API for agents and apps that need cited medical and science context. It returns evidence snippets, source titles, URLs, headings, scores, and trace metadata. It does not write the final answer for you.

Base URL: https://hyfl.uk/rag/v1

OpenAPI schema: https://hyfl.uk/rag/v1/openapi.json

Swagger UI: https://hyfl.uk/rag/v1/docs

Start in 2 Minutes

1. Get a free key

Use the button on the home page, or call the key endpoint directly:


curl -X POST https://hyfl.uk/rag/v1/keys/anonymous \
  -H "Content-Type: application/json" \
  -d '{"name": "my-agent"}'

The response includes a secret. Save it when you see it. The full key is shown once.


{
  "prefix": "sk_live_AbCdEfGhIj",
  "secret": "sk_live_AbCdEfGhIjKlMnOpQrStUvWxYz123456",
  "name": "my-agent",
  "scopes": ["retrieve"],
  "expires_at": "2026-09-20T12:00:00+00:00",
  "budget": {
    "rpm_limit": 60,
    "hourly_limit": 1000,
    "daily_limit": 24000,
    "max_top_k": 25,
    "max_context_chars": 50000,
    "max_request_bytes": 1000000
  }
}

2. Retrieve evidence


KEY="sk_live_..."

curl -X POST https://hyfl.uk/rag/v1/retrieve \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is hypertension?",
    "top_k": 3
  }'

For progress-aware clients, call POST /rag/v1/retrieve/stream with the same JSON body and Accept: text/event-stream. The stream emits status, one result event per returned snippet, results, and done; the results event contains the same retrieval payload as /retrieve.

3. Give the results to your model

Use each result's title, url, heading, and text as grounding context. Ask your model to answer from those snippets and cite the returned URLs.

Content Requests

Visitors can request articles, topics, or sources for manual review from the home page. The public endpoint stores the submission in the service SQLite database and returns a request ID; it does not ingest anything automatically.


curl -X POST https://hyfl.uk/rag/v1/content-requests \
  -H "Content-Type: application/json" \
  -d '{
    "title": "NICE guideline for atrial fibrillation",
    "url": "https://www.nice.org.uk/",
    "details": "Please review this as a source for treatment guidance.",
    "email": "[email protected]"
  }'

Admin keys can view and triage submissions:


ADMIN_KEY="sk_live_..."

curl https://hyfl.uk/rag/v1/content-requests \
  -H "Authorization: Bearer $ADMIN_KEY"

curl -X PATCH https://hyfl.uk/rag/v1/content-requests/1 \
  -H "Authorization: Bearer $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"status": "reviewed"}'

Set HYFL_CONTENT_REQUEST_WEBHOOK_URL and HYFL_CONTENT_REQUEST_WEBHOOK_SECRET for webhook alerts. HYFL signs each webhook with X-HYFL-Webhook-Signature: v1= over X-HYFL-Webhook-Timestamp + "." + raw_body; receivers should reject stale timestamps and invalid signatures. Requests are still stored if alert delivery is not configured or temporarily fails.

What to Send

Minimum request:


{
  "query": "What are common symptoms of iron deficiency anemia?"
}

Better request for an agent:


{
  "query": "What are first-line treatments?",
  "messages": [
    {
      "role": "user",
      "content": "I am researching iron deficiency anemia."
    },
    {
      "role": "assistant",
      "content": "I can retrieve medical context about anemia."
    }
  ],
  "conversation": {
    "current_topic": "iron deficiency anemia",
    "salient_terms": ["anemia", "iron deficiency", "treatment"]
  },
  "corpus": ["medical_core"],
  "top_k": 8,
  "max_context_chars": 6000
}

What You Get Back


{
  "query": "What is hypertension?",
  "contextualized_query": "hypertension",
  "results": [
    {
      "id": "medical_core:chunk:12345",
      "title": "Hypertension",
      "heading": "Signs and symptoms",
      "url": "https://en.wikipedia.org/wiki/Hypertension",
      "score": 0.9182,
      "text": "Hypertension is a long-term medical condition...",
      "trace": {
        "corpus": "medical_core",
        "retriever": "fts",
        "source_query": "hypertension",
        "chunk_index": 4,
        "start_char": 1200,
        "end_char": 1850
      }
    }
  ],
  "latency_ms": 128,
  "conversation": {
    "mode": "direct",
    "used_recent_user_messages": 0,
    "used_summary": false,
    "used_terms": []
  }
}

Important fields:

Field	Use it for
`results[].text`	The grounding context to pass to your LLM
`results[].title`	Human-readable source name
`results[].url`	Citation link
`results[].heading`	Local section context inside the source
`results[].score`	Relative match confidence from 0 to 1
`trace`	Debugging which corpus/retriever/chunk matched

When max_context_chars is exhausted, the API stops before returning tiny unusable tail snippets. A result trace includes truncated: true when a useful snippet was shortened to fit the remaining budget.

Very low-confidence no-match retrievals are suppressed. If the service cannot find evidence above the relevance threshold, results is an empty array so downstream LLMs can decline instead of fabricating from weak neighbors.

Agent Prompt Pattern

When you call your model, keep the instruction simple:


Answer the user using only the evidence snippets below.
If the evidence is insufficient, say what is missing.
Cite sources using the provided URLs.

Then pass the snippets in a compact format:


[1] Hypertension - Signs and symptoms
URL: https://en.wikipedia.org/wiki/Hypertension
TEXT: ...

[2] Blood pressure - Measurement
URL: https://en.wikipedia.org/wiki/Blood_pressure
TEXT: ...

MCP

HYFL RAG also exposes an MCP endpoint for agents that support streamable HTTP MCP.

Endpoint: https://hyfl.uk/rag/mcp

Use the same bearer key:


Authorization: Bearer sk_live_...

The MCP server exposes one tool:

Tool	Purpose
`retrieve_evidence`	Retrieves cited evidence snippets. The caller writes the final answer.

Tool input mirrors /retrieve: query, optional messages, optional conversation, corpus, top_k, and max_context_chars.

Endpoints

`POST /retrieve`

Retrieves cited evidence. Requires Authorization: Bearer .

Request body:

Parameter	Type	Default	Notes
`query`	string	required	1 to 4000 characters
`messages`	array	`[]`	Last few user/assistant turns; capped at 12
`conversation`	object	`null`	Optional topic summary and salient terms
`corpus`	array	`["medical_core"]`	Up to 4 corpus IDs
`top_k`	integer	`8`	Capped by your key, max 25 for free keys
`max_context_chars`	integer	`6000`	Capped by your key, max 50000 for free keys
`include_trace`	boolean	`true`	Include retrieval debug metadata

`GET /corpora`

Lists available corpora. No key required.


curl https://hyfl.uk/rag/v1/corpora

`GET /usage`

Returns your current key limits and recent daily usage. Requires Authorization: Bearer .


curl https://hyfl.uk/rag/v1/usage \
  -H "Authorization: Bearer $KEY"

`GET /health`

Basic service status. No key required.


curl https://hyfl.uk/rag/v1/health

`GET /ready`

Readiness status for the hosted retrieval backend. No key required.


curl https://hyfl.uk/rag/v1/ready

`GET /openapi.json`

Returns the machine-readable OpenAPI schema.


curl https://hyfl.uk/rag/v1/openapi.json

`GET /docs`

Serves Swagger UI for the /rag/v1 API.


https://hyfl.uk/rag/v1/docs

Limits

Free keys are intentionally generous but still bounded so the service stays available.

Limit	Value
Requests per minute	60 per key
Requests per hour	1000 per key
Requests per day	24000 per key
`top_k`	Up to 25
Context per request	Up to 50000 characters
Request body	Up to 1 MB
Key lifespan	90 days

Browser Use

The hosted API supports CORS preflight requests for GET, POST, PATCH, DELETE, and OPTIONS. Browser clients can call /retrieve directly with a bearer key in the Authorization header, or /retrieve/stream when they want Server-Sent Events for progress and final results.

The hourly and daily limits now line up: 1000 per hour sustained for 24 hours equals 24000 per day.

Key creation has lightweight network-level abuse protection. Retrieval is primarily counted per key, with a high network-level guard to stop automated key cycling from one source.

Errors

Status	Meaning	What to do
`400`	Bad request, unknown corpus, or request exceeds key limits	Check `query`, `corpus`, `top_k`, and `max_context_chars`
`401`	Missing or invalid bearer key	Send `Authorization: Bearer`
`403`	Key lacks the required scope	Use a retrieve-capable key
`413`	Request body too large	Send fewer messages or less context
`429`	Rate limit exceeded	Back off and retry later
`504`	Retrieval timed out	Retry with smaller `top_k` or fewer corpora

Example 429 response:


{
  "detail": "hourly limit of 1000/hr exceeded"
}

Practical Defaults

For most agents:

Setting	Recommended value
`top_k`	`5` to `8`
`max_context_chars`	`4000` to `8000`
`messages`	Last 2 to 6 turns only
`include_trace`	`false` in production, `true` while debugging

Safety Notes

HYFL RAG retrieves source text. It is not a doctor, medical device, diagnosis system, or substitute for professional judgement. For medical end-user experiences, your app should clearly explain that retrieved evidence is informational and should be checked against appropriate clinical guidance.