HYFL RAG API Docs

HYFL RAG is a hosted retrieval API for agents and apps that need cited medical and science context. It returns evidence snippets, source titles, URLs, headings, scores, and trace metadata. It does not write the final answer for you.

Base URL: https://hyfl.uk/rag/v1

OpenAPI schema: https://hyfl.uk/rag/v1/openapi.json

Swagger UI: https://hyfl.uk/rag/v1/docs

Start in 2 Minutes

1. Get a free key

Use the button on the home page, or call the key endpoint directly:


curl -X POST https://hyfl.uk/rag/v1/keys/anonymous \
  -H "Content-Type: application/json" \
  -d '{"name": "my-agent"}'

The response includes a secret. Save it when you see it. The full key is shown once.


{
  "prefix": "sk_live_AbCdEfGhIj",
  "secret": "sk_live_AbCdEfGhIjKlMnOpQrStUvWxYz123456",
  "name": "my-agent",
  "scopes": ["retrieve"],
  "expires_at": "2026-09-20T12:00:00+00:00",
  "budget": {
    "rpm_limit": 60,
    "hourly_limit": 1000,
    "daily_limit": 24000,
    "max_top_k": 25,
    "max_context_chars": 50000,
    "max_request_bytes": 1000000
  }
}

2. Retrieve evidence


KEY="sk_live_..."

curl -X POST https://hyfl.uk/rag/v1/retrieve \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is hypertension?",
    "top_k": 3
  }'

For progress-aware clients, call POST /rag/v1/retrieve/stream with the same JSON body and Accept: text/event-stream. The stream emits status, one result event per returned snippet, results, and done; the results event contains the same retrieval payload as /retrieve.

3. Give the results to your model

Use each result's title, url, heading, and text as grounding context. Ask your model to answer from those snippets and cite the returned URLs.

Content Requests

Visitors can request articles, topics, or sources for manual review from the home page. The public endpoint stores the submission in the service SQLite database and returns a request ID; it does not ingest anything automatically.


curl -X POST https://hyfl.uk/rag/v1/content-requests \
  -H "Content-Type: application/json" \
  -d '{
    "title": "NICE guideline for atrial fibrillation",
    "url": "https://www.nice.org.uk/",
    "details": "Please review this as a source for treatment guidance.",
    "email": "[email protected]"
  }'

Admin keys can view and triage submissions:


ADMIN_KEY="sk_live_..."

curl https://hyfl.uk/rag/v1/content-requests \
  -H "Authorization: Bearer $ADMIN_KEY"

curl -X PATCH https://hyfl.uk/rag/v1/content-requests/1 \
  -H "Authorization: Bearer $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"status": "reviewed"}'

Set HYFL_CONTENT_REQUEST_WEBHOOK_URL and HYFL_CONTENT_REQUEST_WEBHOOK_SECRET for webhook alerts. HYFL signs each webhook with X-HYFL-Webhook-Signature: v1= over X-HYFL-Webhook-Timestamp + "." + raw_body; receivers should reject stale timestamps and invalid signatures. Requests are still stored if alert delivery is not configured or temporarily fails.

What to Send

Minimum request:


{
  "query": "What are common symptoms of iron deficiency anemia?"
}

Better request for an agent:


{
  "query": "What are first-line treatments?",
  "messages": [
    {
      "role": "user",
      "content": "I am researching iron deficiency anemia."
    },
    {
      "role": "assistant",
      "content": "I can retrieve medical context about anemia."
    }
  ],
  "conversation": {
    "current_topic": "iron deficiency anemia",
    "salient_terms": ["anemia", "iron deficiency", "treatment"]
  },
  "corpus": ["medical_core"],
  "top_k": 8,
  "max_context_chars": 6000
}

What You Get Back


{
  "query": "What is hypertension?",
  "contextualized_query": "hypertension",
  "results": [
    {
      "id": "medical_core:chunk:12345",
      "title": "Hypertension",
      "heading": "Signs and symptoms",
      "url": "https://en.wikipedia.org/wiki/Hypertension",
      "score": 0.9182,
      "text": "Hypertension is a long-term medical condition...",
      "trace": {
        "corpus": "medical_core",
        "retriever": "fts",
        "source_query": "hypertension",
        "chunk_index": 4,
        "start_char": 1200,
        "end_char": 1850
      }
    }
  ],
  "latency_ms": 128,
  "conversation": {
    "mode": "direct",
    "used_recent_user_messages": 0,
    "used_summary": false,
    "used_terms": []
  }
}

Important fields:

Field Use it for
results[].text The grounding context to pass to your LLM
results[].title Human-readable source name
results[].url Citation link
results[].heading Local section context inside the source
results[].score Relative match confidence from 0 to 1
trace Debugging which corpus/retriever/chunk matched

When max_context_chars is exhausted, the API stops before returning tiny unusable tail snippets. A result trace includes truncated: true when a useful snippet was shortened to fit the remaining budget.

Very low-confidence no-match retrievals are suppressed. If the service cannot find evidence above the relevance threshold, results is an empty array so downstream LLMs can decline instead of fabricating from weak neighbors.

Agent Prompt Pattern

When you call your model, keep the instruction simple:


Answer the user using only the evidence snippets below.
If the evidence is insufficient, say what is missing.
Cite sources using the provided URLs.

Then pass the snippets in a compact format:


[1] Hypertension - Signs and symptoms
URL: https://en.wikipedia.org/wiki/Hypertension
TEXT: ...

[2] Blood pressure - Measurement
URL: https://en.wikipedia.org/wiki/Blood_pressure
TEXT: ...

MCP

HYFL RAG also exposes an MCP endpoint for agents that support streamable HTTP MCP.

Endpoint: https://hyfl.uk/rag/mcp

Use the same bearer key:


Authorization: Bearer sk_live_...

The MCP server exposes one tool:

Tool Purpose
retrieve_evidence Retrieves cited evidence snippets. The caller writes the final answer.

Tool input mirrors /retrieve: query, optional messages, optional conversation, corpus, top_k, and max_context_chars.

Endpoints

POST /retrieve

Retrieves cited evidence. Requires Authorization: Bearer .

Request body:

Parameter Type Default Notes
query string required 1 to 4000 characters
messages array [] Last few user/assistant turns; capped at 12
conversation object null Optional topic summary and salient terms
corpus array ["medical_core"] Up to 4 corpus IDs
top_k integer 8 Capped by your key, max 25 for free keys
max_context_chars integer 6000 Capped by your key, max 50000 for free keys
include_trace boolean true Include retrieval debug metadata

GET /corpora

Lists available corpora. No key required.


curl https://hyfl.uk/rag/v1/corpora

GET /usage

Returns your current key limits and recent daily usage. Requires Authorization: Bearer .


curl https://hyfl.uk/rag/v1/usage \
  -H "Authorization: Bearer $KEY"

GET /health

Basic service status. No key required.


curl https://hyfl.uk/rag/v1/health

GET /ready

Readiness status for the hosted retrieval backend. No key required.


curl https://hyfl.uk/rag/v1/ready

GET /openapi.json

Returns the machine-readable OpenAPI schema.


curl https://hyfl.uk/rag/v1/openapi.json

GET /docs

Serves Swagger UI for the /rag/v1 API.


https://hyfl.uk/rag/v1/docs

Limits

Free keys are intentionally generous but still bounded so the service stays available.

Limit Value
Requests per minute 60 per key
Requests per hour 1000 per key
Requests per day 24000 per key
top_k Up to 25
Context per request Up to 50000 characters
Request body Up to 1 MB
Key lifespan 90 days

Browser Use

The hosted API supports CORS preflight requests for GET, POST, PATCH, DELETE, and OPTIONS. Browser clients can call /retrieve directly with a bearer key in the Authorization header, or /retrieve/stream when they want Server-Sent Events for progress and final results.

The hourly and daily limits now line up: 1000 per hour sustained for 24 hours equals 24000 per day.

Key creation has lightweight network-level abuse protection. Retrieval is primarily counted per key, with a high network-level guard to stop automated key cycling from one source.

Errors

Status Meaning What to do
400 Bad request, unknown corpus, or request exceeds key limits Check query, corpus, top_k, and max_context_chars
401 Missing or invalid bearer key Send Authorization: Bearer
403 Key lacks the required scope Use a retrieve-capable key
413 Request body too large Send fewer messages or less context
429 Rate limit exceeded Back off and retry later
504 Retrieval timed out Retry with smaller top_k or fewer corpora

Example 429 response:


{
  "detail": "hourly limit of 1000/hr exceeded"
}

Practical Defaults

For most agents:

Setting Recommended value
top_k 5 to 8
max_context_chars 4000 to 8000
messages Last 2 to 6 turns only
include_trace false in production, true while debugging

Safety Notes

HYFL RAG retrieves source text. It is not a doctor, medical device, diagnosis system, or substitute for professional judgement. For medical end-user experiences, your app should clearly explain that retrieved evidence is informational and should be checked against appropriate clinical guidance.