Rate limiting - KavachOS

Overview

KavachOS applies rate limits at two layers: auth endpoints get IP-based limits to protect against brute force and credential stuffing, and the permission engine enforces per-agent call limits via the maxCallsPerHour constraint. Both layers return a standard 429 response and set Retry-After when applicable.

Built-in auth endpoint limits

These limits apply automatically with no configuration required.

Endpoint	Limit	Window
`POST /sign-in`	10 requests	per IP per minute
`POST /sign-up`	5 requests	per IP per minute
`POST /magic-link`	3 requests	per IP per minute
`POST /email-otp`	5 requests	per IP per minute
`POST /totp/verify`	10 requests	per IP per minute
`POST /token` (OAuth)	20 requests	per IP per minute

Limits are tracked in-process by default. For multi-instance deployments, configure a Redis store so all instances share the same counters.

Configuring auth limits

Pass a rateLimit block to createKavach to override the defaults or enable Redis:

import { createKavach } from 'kavachos';

const kavach = await createKavach({
  database: { provider: 'postgres', url: process.env.DATABASE_URL },
  rateLimit: {
    store: 'memory',        // or 'redis'
    redisUrl: process.env.REDIS_URL,
    endpoints: {
      signIn: { limit: 5, window: 60 },    // 5 per minute (stricter)
      signUp: { limit: 2, window: 60 },
      magicLink: { limit: 2, window: 60 },
    },
  },
});

store

"memory" | "redis"

default:"\"memory\""

Where to persist counters. Use redis in production when running multiple instances.

redisUrl

string | undefined

Redis connection string. Required when store is redis.

Override for the sign-in endpoint. window is in seconds.

Override for the sign-up endpoint.

Per-agent limits with maxCallsPerHour

The permission engine supports a maxCallsPerHour constraint. When an agent exceeds its hourly call budget, the permission check returns allowed: false with reason: "Rate limit exceeded".

const agent = await kavach.agent.create({
  ownerId: 'user-123',
  name: 'data-sync-bot',
  type: 'autonomous',
  permissions: [
    {
      resource: 'db:reports:*',
      actions: ['read'],
      constraints: {
        maxCallsPerHour: 100,
      },
    },
    {
      resource: 'db:reports:*',
      actions: ['export'],
      constraints: {
        maxCallsPerHour: 10,  // stricter for expensive operations
      },
    },
  ],
});

The counter resets at the top of each clock hour. If you need sliding windows, use createRateLimiter instead (see below).

Checking the limit in your code

const result = await kavach.authorize(agent.id, {
  action: 'read',
  resource: 'db:reports:monthly',
});

if (!result.allowed) {
  if (result.reason === 'Rate limit exceeded') {
    // Return 429 to the calling agent
    return new Response('Too many requests', {
      status: 429,
      headers: { 'Retry-After': '3600' },
    });
  }
}

createRateLimiter

Use createRateLimiter for custom rate limiting on your own endpoints, for example, an expensive AI inference route that should be capped per user.

import { createRateLimiter } from 'kavachos';

const inferenceLimit = createRateLimiter({
  limit: 20,
  window: 3600,           // 1 hour in seconds
  keyFn: (req) => req.headers.get('x-user-id') ?? req.ip ?? 'anonymous',
  store: 'redis',
  redisUrl: process.env.REDIS_URL,
});

limit

number

Maximum number of requests allowed within the window.

window

number

Time window in seconds.

keyFn

(req: Request) => string

Derives the rate limit key from the request. Defaults to the client IP.

store

"memory" | "redis"

default:"\"memory\""

Counter storage backend.

redisUrl

string | undefined

Redis connection string. Required when store is redis.

withRateLimit middleware

Wrap any handler with withRateLimit to apply a limiter without modifying the handler itself.

import { withRateLimit } from 'kavachos';

const inferenceLimit = createRateLimiter({
  limit: 20,
  window: 3600,
  keyFn: (req) => req.headers.get('x-user-id') ?? 'anonymous',
});

// Works with any framework adapter. The handler receives a standard Request.
export const POST = withRateLimit(inferenceLimit, async (req: Request) => {
  const result = await runInference(await req.json());
  return Response.json(result);
});

When the limit is exceeded, withRateLimit returns a 429 response automatically and does not call the wrapped handler.

429 response format

All rate limit rejections, from auth endpoints, the permission engine, or withRateLimit, return the same shape:

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Too many requests. Try again after 60 seconds.",
    "details": {
      "limit": 10,
      "window": 60,
      "retryAfter": 60
    }
  }
}

The Retry-After header is also set to the number of seconds remaining in the current window.

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 42

Next steps

Permissions

Add maxCallsPerHour and other constraints to individual permissions.

Approval flows

Pair rate limits with human-in-the-loop approval for sensitive actions.

Anomaly detection

Use anomaly scoring alongside rate limits for behavioural signals.

Documentation Index

​Overview

​Built-in auth endpoint limits

​Configuring auth limits

​Per-agent limits with maxCallsPerHour

​Checking the limit in your code

​createRateLimiter

​withRateLimit middleware

​429 response format

​Next steps

Permissions

Approval flows

Anomaly detection

Overview

Built-in auth endpoint limits

Configuring auth limits

Per-agent limits with maxCallsPerHour

Checking the limit in your code

createRateLimiter

withRateLimit middleware

429 response format

Next steps