HYFL RAG API Docs
HYFL RAG is a hosted retrieval API for agents and apps that need cited medical and science context. It returns evidence snippets, source titles, URLs, headings, scores, and trace metadata. It does not write the final answer for you.
Base URL: https://hyfl.uk/rag/v1
OpenAPI schema: https://hyfl.uk/rag/v1/openapi.json
Swagger UI: https://hyfl.uk/rag/v1/docs
Start in 2 Minutes
1. Get a free key
Use the button on the home page, or call the key endpoint directly:
curl -X POST https://hyfl.uk/rag/v1/keys/anonymous \
-H "Content-Type: application/json" \
-d '{"name": "my-agent"}'
The response includes a secret. Save it when you see it. The full key is shown once.
{
"prefix": "sk_live_AbCdEfGhIj",
"secret": "sk_live_AbCdEfGhIjKlMnOpQrStUvWxYz123456",
"name": "my-agent",
"scopes": ["retrieve"],
"expires_at": "2026-09-20T12:00:00+00:00",
"budget": {
"rpm_limit": 60,
"hourly_limit": 1000,
"daily_limit": 24000,
"max_top_k": 25,
"max_context_chars": 50000,
"max_request_bytes": 1000000
}
}
2. Retrieve evidence
KEY="sk_live_..."
curl -X POST https://hyfl.uk/rag/v1/retrieve \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What is hypertension?",
"top_k": 3
}'
For progress-aware clients, call POST /rag/v1/retrieve/stream with the same JSON body and Accept: text/event-stream. The stream emits status, one result event per returned snippet, results, and done; the results event contains the same retrieval payload as /retrieve.
3. Give the results to your model
Use each result's title, url, heading, and text as grounding context. Ask your model to answer from those snippets and cite the returned URLs.
Content Requests
Visitors can request articles, topics, or sources for manual review from the home page. The public endpoint stores the submission in the service SQLite database and returns a request ID; it does not ingest anything automatically.
curl -X POST https://hyfl.uk/rag/v1/content-requests \
-H "Content-Type: application/json" \
-d '{
"title": "NICE guideline for atrial fibrillation",
"url": "https://www.nice.org.uk/",
"details": "Please review this as a source for treatment guidance.",
"email": "[email protected]"
}'
Admin keys can view and triage submissions:
ADMIN_KEY="sk_live_..."
curl https://hyfl.uk/rag/v1/content-requests \
-H "Authorization: Bearer $ADMIN_KEY"
curl -X PATCH https://hyfl.uk/rag/v1/content-requests/1 \
-H "Authorization: Bearer $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{"status": "reviewed"}'
Set HYFL_CONTENT_REQUEST_WEBHOOK_URL and HYFL_CONTENT_REQUEST_WEBHOOK_SECRET for webhook alerts. HYFL signs each webhook with X-HYFL-Webhook-Signature: v1= over X-HYFL-Webhook-Timestamp + "." + raw_body; receivers should reject stale timestamps and invalid signatures. Requests are still stored if alert delivery is not configured or temporarily fails.
What to Send
Minimum request:
{
"query": "What are common symptoms of iron deficiency anemia?"
}
Better request for an agent:
{
"query": "What are first-line treatments?",
"messages": [
{
"role": "user",
"content": "I am researching iron deficiency anemia."
},
{
"role": "assistant",
"content": "I can retrieve medical context about anemia."
}
],
"conversation": {
"current_topic": "iron deficiency anemia",
"salient_terms": ["anemia", "iron deficiency", "treatment"]
},
"corpus": ["medical_core"],
"top_k": 8,
"max_context_chars": 6000
}
What You Get Back
{
"query": "What is hypertension?",
"contextualized_query": "hypertension",
"results": [
{
"id": "medical_core:chunk:12345",
"title": "Hypertension",
"heading": "Signs and symptoms",
"url": "https://en.wikipedia.org/wiki/Hypertension",
"score": 0.9182,
"text": "Hypertension is a long-term medical condition...",
"trace": {
"corpus": "medical_core",
"retriever": "fts",
"source_query": "hypertension",
"chunk_index": 4,
"start_char": 1200,
"end_char": 1850
}
}
],
"latency_ms": 128,
"conversation": {
"mode": "direct",
"used_recent_user_messages": 0,
"used_summary": false,
"used_terms": []
}
}
Important fields:
| Field | Use it for |
|---|---|
results[].text |
The grounding context to pass to your LLM |
results[].title |
Human-readable source name |
results[].url |
Citation link |
results[].heading |
Local section context inside the source |
results[].score |
Relative match confidence from 0 to 1 |
trace |
Debugging which corpus/retriever/chunk matched |
When max_context_chars is exhausted, the API stops before returning tiny unusable tail snippets. A result trace includes truncated: true when a useful snippet was shortened to fit the remaining budget.
Very low-confidence no-match retrievals are suppressed. If the service cannot find evidence above the relevance threshold, results is an empty array so downstream LLMs can decline instead of fabricating from weak neighbors.
Agent Prompt Pattern
When you call your model, keep the instruction simple:
Answer the user using only the evidence snippets below.
If the evidence is insufficient, say what is missing.
Cite sources using the provided URLs.
Then pass the snippets in a compact format:
[1] Hypertension - Signs and symptoms
URL: https://en.wikipedia.org/wiki/Hypertension
TEXT: ...
[2] Blood pressure - Measurement
URL: https://en.wikipedia.org/wiki/Blood_pressure
TEXT: ...
MCP
HYFL RAG also exposes an MCP endpoint for agents that support streamable HTTP MCP.
Endpoint: https://hyfl.uk/rag/mcp
Use the same bearer key:
Authorization: Bearer sk_live_...
The MCP server exposes one tool:
| Tool | Purpose |
|---|---|
retrieve_evidence |
Retrieves cited evidence snippets. The caller writes the final answer. |
Tool input mirrors /retrieve: query, optional messages, optional conversation, corpus, top_k, and max_context_chars.
Endpoints
POST /retrieve
Retrieves cited evidence. Requires Authorization: Bearer .
Request body:
| Parameter | Type | Default | Notes |
|---|---|---|---|
query |
string | required | 1 to 4000 characters |
messages |
array | [] |
Last few user/assistant turns; capped at 12 |
conversation |
object | null |
Optional topic summary and salient terms |
corpus |
array | ["medical_core"] |
Up to 4 corpus IDs |
top_k |
integer | 8 |
Capped by your key, max 25 for free keys |
max_context_chars |
integer | 6000 |
Capped by your key, max 50000 for free keys |
include_trace |
boolean | true |
Include retrieval debug metadata |
GET /corpora
Lists available corpora. No key required.
curl https://hyfl.uk/rag/v1/corpora
GET /usage
Returns your current key limits and recent daily usage. Requires Authorization: Bearer .
curl https://hyfl.uk/rag/v1/usage \
-H "Authorization: Bearer $KEY"
GET /health
Basic service status. No key required.
curl https://hyfl.uk/rag/v1/health
GET /ready
Readiness status for the hosted retrieval backend. No key required.
curl https://hyfl.uk/rag/v1/ready
GET /openapi.json
Returns the machine-readable OpenAPI schema.
curl https://hyfl.uk/rag/v1/openapi.json
GET /docs
Serves Swagger UI for the /rag/v1 API.
https://hyfl.uk/rag/v1/docs
Limits
Free keys are intentionally generous but still bounded so the service stays available.
| Limit | Value |
|---|---|
| Requests per minute | 60 per key |
| Requests per hour | 1000 per key |
| Requests per day | 24000 per key |
top_k |
Up to 25 |
| Context per request | Up to 50000 characters |
| Request body | Up to 1 MB |
| Key lifespan | 90 days |
Browser Use
The hosted API supports CORS preflight requests for GET, POST, PATCH, DELETE, and OPTIONS. Browser clients can call /retrieve directly with a bearer key in the Authorization header, or /retrieve/stream when they want Server-Sent Events for progress and final results.
The hourly and daily limits now line up: 1000 per hour sustained for 24 hours equals 24000 per day.
Key creation has lightweight network-level abuse protection. Retrieval is primarily counted per key, with a high network-level guard to stop automated key cycling from one source.
Errors
| Status | Meaning | What to do |
|---|---|---|
400 |
Bad request, unknown corpus, or request exceeds key limits | Check query, corpus, top_k, and max_context_chars |
401 |
Missing or invalid bearer key | Send Authorization: Bearer |
403 |
Key lacks the required scope | Use a retrieve-capable key |
413 |
Request body too large | Send fewer messages or less context |
429 |
Rate limit exceeded | Back off and retry later |
504 |
Retrieval timed out | Retry with smaller top_k or fewer corpora |
Example 429 response:
{
"detail": "hourly limit of 1000/hr exceeded"
}
Practical Defaults
For most agents:
| Setting | Recommended value |
|---|---|
top_k |
5 to 8 |
max_context_chars |
4000 to 8000 |
messages |
Last 2 to 6 turns only |
include_trace |
false in production, true while debugging |
Safety Notes
HYFL RAG retrieves source text. It is not a doctor, medical device, diagnosis system, or substitute for professional judgement. For medical end-user experiences, your app should clearly explain that retrieved evidence is informational and should be checked against appropriate clinical guidance.