Rate limits
The HESO cloud API is rate-limited per organization, by plan tier. Local signing and verification are never rate-limited — they run with no network and no API key, so a throttled cloud mirror never stops your agent.
What is limited
Rate limits apply only to the cloud API at api.heso.ca — pushing receipts to the mirror, pulling policy, polling approvals. Everything local is exempt: gating an action, signing it, building the BLAKE3-chained ledger, and verifying a receipt all run on your machine with no API call.
The cloud receipt store is a non-authoritative mirror. Your local BLAKE3 chain is the source of truth, and it keeps writing whether or not the cloud accepts a push. A 429 or 503 delays mirroring; it never blocks the action, drops a receipt, or breaks the chain.
How limits are keyed
The authoritative limit is the per-plan cap, and it is keyed on your organization, not on the API key. Issuing more keys does not raise your throughput — every key for the org draws on the same budget. Your plan tier (free, pro, team, or custom) sets the rate; the exact numbers live with your plan, so the pricing page lists the rate for each tier and your dashboard shows what is in effect for your team.
Behind the plan cap sit two backstops that protect the platform rather than meter a tenant:
- A per-key and per-IP cap at the edge. These are set above every plan tier, so a normal tenant never hits them — they only catch a single credential or address flooding the API before authentication.
- A global in-flight cap. If too many requests are being served at once, new requests get a fast
503instead of piling onto a busy backend. This is a momentary-capacity guard, not a per-tenant limit.
In practice the per-plan cap is the one you tune your client against; the backstops are there so one noisy caller cannot degrade everyone.
When you go over
Exceeding your plan rate returns 429 Too Many Requests. The response carries headers that tell you how long to wait and where you stand:
| Header | Meaning |
|---|---|
Retry-After | Seconds to wait before retrying. Honour this first. |
RateLimit-Limit | The bucket capacity — the most requests allowed in a burst. |
RateLimit-Remaining | Requests left before you are throttled again. |
RateLimit-Reset | Seconds until the bucket refills to full. |
RateLimit-Scope | Set to tenant on a plan-cap 429, which tells it apart from a coarse edge backstop. |
You may also see 503 Service Unavailable with a short Retry-After when the platform is momentarily at capacity (the global in-flight cap). Treat it the same way as a 429: wait, then retry.
Backing off
On a 429 or 503, wait for the Retry-After value if present, then retry. If it keeps coming, or no Retry-After is set, back off exponentially. Because the local chain is already durable, a push can wait and be retried later without losing anything.
async function pushWithBackoff(client, apiKey, receipt, maxRetries = 5) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const res = await client.pushReceipt(apiKey, receipt)
if (res.status !== 429 && res.status !== 503) return res
// Honour Retry-After when present; otherwise back off exponentially.
const header = res.headers.get("Retry-After")
const waitMs = header
? Number(header) * 1000
: Math.min(30_000, 2 ** attempt * 500)
await new Promise((r) => setTimeout(r, waitMs))
}
throw new Error("cloud mirror still throttled after retries")
}The TypeScript SDK surfaces the HTTP status and headers on a structured result, so you branch on a throttle without parsing messages. See the cloud API reference for the full endpoint and status-code list, and authentication for how keys and scopes work.