Sammy
Getting Started

Introduction to Sammy

Sammy is an auto-configuring AI agent framework that scans your codebase, understands your APIs and business logic, and generates production-ready AI agents — all without requiring AI expertise.

Most AI agent frameworks require you to manually define agents, write tool schemas, wire up handlers, and decide on orchestration patterns. Sammy inverts this — it treats your codebase as the source of truth.

The pipeline: Scan your codebase to understand what it does, determine the optimal agent architecture, generate all tools and schemas, evaluate and refine in a closed loop, then deploy a production-ready system.

Quick Start

Go from an existing Next.js app to a working, tool-using AI assistant at /chat in about 10 minutes. The CLI scaffolds the wiring for you.

Prerequisites

  • A Next.js project (App Router) with some API routes or server functions
  • Node.js 18+
  • A Sammy API key (create one in your dashboard)

Step 1: Install the SDK

The package is sammy-sdk; it ships the sammy CLI.

bash
npm install sammy-sdk

Step 2: Scan your codebase

Add your API key to .env.local, then run discovery — it writes a reviewable sammy.config.json.

bash
# .env.local
SAMMY_API_KEY=sk_sammy_your_key_here

npx sammy init

Step 3: Generate tools & agents

bash
npx sammy generate   # writes .sammy/tools + .sammy/agents
npx sammy eval       # optional: self-evaluate and refine

Step 4: Scaffold the wiring

This writes the server helper, the chat + history API routes, and a /chat page, and patches next.config, middleware, and .env.local. See for details.

bash
npx sammy scaffold next        # add --dry-run to preview first

Step 5: Install the chat UI

The scaffolded /chat page renders SammyChatApp, so install the frontend packages:

bash
npm install sammy-sdk-ui sammy-sdk-react sammy-sdk-client

Step 6: Run it

For local demos without logging in, set a real seeded user id so write tools that persist foreign keys succeed (scaffold adds the placeholder):

bash
# .env.local
SAMMY_DEV_USER_ID=<a-real-user-id-from-your-db>

npm run dev   # then open http://localhost:3000/chat
That's it!

Your assistant is live at /chat with streaming, history, and a New chat button. Before shipping, swap the in-memory store for a real one and wire real auth — see and .

Recommended

Scaffolding

sammy scaffold next writes the Next.js App Router wiring so you don't have to copy it by hand. It is idempotent and never overwrites your files — anything it can't edit safely is printed as a manual step.

bash
npx sammy scaffold next
npx sammy scaffold next --dry-run   # preview without writing

What it creates

lib/sammy-server.ts

A single Sammy instance (via globalThis) plus resolveUserFromSession.

app/api/chat/route.ts

The chat endpoint (POST, JSON + SSE streaming).

app/api/chat/conversations/…

History routes: list, get one, delete.

app/chat/page.tsx

A ready-to-use page rendering SammyChatApp.

What it patches

next.config — adds serverExternalPackages: ["sammy-sdk", "tsx"] (and outputFileTracingRoot for nested monorepo apps).

middleware — if you have a PUBLIC_PATHS allowlist, it adds /chat and /api/chat so the UI isn't redirected to login in dev.

.env.local — adds a SAMMY_DEV_USER_ID= placeholder.

Auto-detection: scaffold detects your src/ layout, your @/ path alias, and your auth provider (NextAuth, Clerk, …) and adapts the generated code accordingly.

CLI Reference

All Sammy CLI commands and their options.

sammy init

Scan your codebase and generate sammy.config.json

--force
sammy generate

Generate tools and agents from sammy.config.json

--dry-run--domain <name>--force
sammy scaffold next

Scaffold Next.js App Router wiring (server, routes, chat page)

--dry-run
sammy eval

Run the evaluation loop to test and refine agents

--no-refine--max-iterations <n>--ci--dimensions <dims>--agent <name>--from-iteration <n>--dry-run
sammy dev

Start the Sammy dev server with hot-reload

--port <n>--no-open

Discovery Engine

The discovery engine is Sammy's core differentiator. It performs static analysis combined with LLM-powered semantic understanding to build a complete capability map of your application.

What Gets Scanned

Route Handlers

HTTP method, path, params, response shape, auth requirements, side effects

Server Actions

Function signatures, parameters, return types, side effects

Database Models

Prisma, Drizzle, Mongoose schemas — fields, relations, types

External Services

Stripe, SendGrid, Twilio, AWS — detected via env vars and imports

Auth System

Provider (NextAuth, Clerk, etc.), roles, permissions, session shape

Business Logic

Service files, shared types, constants, utility functions

How It Works

1. Framework Detection — Sammy reads your package.json and project structure to identify Next.js, Express, Fastify, etc.

2. Static Scanning — Framework-specific scanners extract routes, models, and services using AST parsing and pattern matching. No LLM calls — this is fast, free, and local.

3. LLM Analysis — The structured capability map is sent to a powerful-tier model which clusters capabilities into business domains, names tools, and recommends an agent architecture.

4. Config Output — Results are written to sammy.config.json — a human-readable, editable file you review before proceeding.

Tool Generation

The tool generator reads sammy.config.json and scaffolds executable tool files that agents can invoke at runtime.

What Gets Generated

Zod Schemas — Strict input validation for every tool parameter. Enums, date formats, email validation — all inferred from your existing types.

Handler Wrappers — Each tool wraps your existing function with error boundaries, permission checks, and structured output formatting.

Agent Definitions — Each specialist agent gets a definition file with assigned tools, a system prompt, and a model tier selection.

typescript
// .sammy/tools/billing/getInvoices.ts (auto-generated)
import { z } from "zod";

export const getInvoicesSchema = z.object({
  customerId: z.string().describe("The customer ID"),
  status: z.enum(["draft", "open", "paid", "void"]).optional(),
  from: z.string().date().optional(),
  to: z.string().date().optional(),
});

export const getInvoices = {
  name: "getInvoices",
  description: "List invoices for a customer with optional filters",
  schema: getInvoicesSchema,
  permission: "read",
  handler: async (params, context) => {
    const { listInvoices } = await import("@/services/billing/invoices");
    return listInvoices(params);
  },
};

Flagged tools: If Sammy can't auto-wire a handler (e.g., multipart uploads, complex aggregations), it generates a stub and flags it. You fill in ~10-20 lines of glue code.

Runtime

The runtime engine is what serves your AI assistant in production. It handles message routing, agent orchestration, tool execution, and streaming responses.

Request Flow

1. Message In — User sends a message via your /api/chat endpoint.

2. Router — A fast-tier model classifies intent and picks the right specialist agent (or creates a multi-agent plan).

3. Specialist Agent — The selected agent processes the request using a balanced-tier model, decides which tools to call, and extracts parameters.

4. Tool Execution — Tools run locally against your database and APIs. No LLM call needed — just your existing code.

5. Response — The agent formats the tool results into natural language and streams back via SSE.

Modes

Single-Agent

All tools on one agent. Best for 1-2 domains. Simple, fast, lower latency.

Multi-Agent

Router + specialist agents per domain. Best for 3+ domains. Better accuracy, cross-agent plans.

Evaluation & Refinement

Sammy auto-generates test scenarios, runs them against your agents, scores across 6 dimensions, and self-refines until all thresholds pass.

The 6 Scoring Dimensions

Tool Selection

≥ 90%

Did the agent pick the right tool?

Parameter Extraction

≥ 85%

Were parameters correctly extracted?

Router Accuracy

≥ 95%

Was the message routed to the right agent?

Response Quality

≥ 7.0/10

Was the response helpful and well-formatted?

Safety Compliance

100%

Were permission boundaries respected?

Conversation Coherence

≥ 80%

Did multi-turn context hold up?

The Refinement Loop

Generate — 60+ test scenarios based on your actual tools and domains.

Run — Execute every scenario against the agent system and score.

Diagnose — Categorize failures (TOOL_DESCRIPTION, SCHEMA_MISMATCH, PROMPT_CLARITY, CODE_ISSUE).

Refine — Auto-fix what it can (tighten schemas, rewrite descriptions, add prompt rules).

Re-run — Loop until all dimensions pass or a guard triggers (max 5 iterations, diminishing returns).

sammy.config.json

The config file is the contract between discovery and everything downstream. It's auto-generated but designed to be human-readable and editable.

json
{
  "$schema": "https://unpkg.com/sammy-sdk/schema.json",
  "version": "1.0",
  "project": {
    "framework": "nextjs",
    "frameworkVersion": "15.2",
    "router": "app",
    "orm": "prisma"
  },
  "domains": [
    {
      "name": "billing",
      "description": "Manages subscriptions, invoices, and payments via Stripe",
      "tools": [
        {
          "name": "getInvoices",
          "type": "query",
          "permission": "read",
          "handler": "src/services/billing/invoices.ts:listInvoices"
        }
      ]
    }
  ],
  "architecture": {
    "type": "multi-agent",
    "router": { "model": "fast" },
    "agents": [
      { "name": "billing-agent", "domains": ["billing"], "model": "balanced" }
    ]
  }
}

You can edit anything in this file before running sammy generate: rename domains, remove tools, change model tiers, force a different architecture, or add custom system prompts.

Architecture Options

Sammy automatically recommends an architecture based on how many domains are discovered and their complexity.

Single-Agent

1–2 domains

All tools on one agent. Simple, fast, lowest latency. Best for small projects with few capabilities.

Multi-Agent + Router

3–6 domains

A fast router classifies intent, then delegates to specialist agents — one per domain. Each agent has its own tools and system prompt.

Hierarchical

7+ domains

Router delegates to sub-routers, which delegate to specialists. Scales to large orgs with many distinct business areas.

You can override the recommendation by setting architecture.type in your config to "single", "multi-agent", or "hierarchical".

Model Tiers

Sammy uses abstract model tiers instead of specific model names. This lets us upgrade models without changing your code.

TierUsed ForSpeedQuality
fastRouter, simple queries, classificationFastestGood
balancedSpecialist agents, most tool callsFastVery Good
powerfulDiscovery analysis, eval judging, complex reasoningModerateBest

Note: Sammy Cloud maps these tiers to specific models server-side. We can upgrade the underlying models without any changes to your code or configuration.

API Routes

generates everything below — this section documents what those files contain so you can customize them or wire them by hand.

One shared Sammy instance

Next.js can load each route in a separate module graph, so keep a single instance on globalThis — that way chat and history share one conversation store.

typescript
// lib/sammy-server.ts
import { Sammy } from "sammy-sdk";
import { auth } from "@/lib/auth";

const g = globalThis as typeof globalThis & { __sammy?: Sammy };

export function getSammy() {
  if (!g.__sammy) {
    g.__sammy = new Sammy({
      systemPrompt: "You are a helpful assistant. Use tools for current data; be concise.",
      // conversationStorage: new YourRedisStorage(...), // see Conversation Storage
    });
  }
  return g.__sammy;
}

// Never trust a client-sent user in production — resolve it server-side.
export async function resolveUserFromSession(_req: Request) {
  const session = await auth();
  if (session?.user?.id) {
    return { id: session.user.id, email: session.user.email ?? undefined };
  }
  // Dev only: FK-safe writes when demoing /chat without logging in.
  const devId = process.env.SAMMY_DEV_USER_ID;
  if (devId) return { id: devId };
  return undefined;
}

The chat endpoint

One handler — it handles JSON responses, SSE streaming, body validation, and errors.

typescript
// app/api/chat/route.ts
import { createSammyRouteHandler } from "sammy-sdk";
import { getSammy, resolveUserFromSession } from "@/lib/sammy-server";

export const runtime = "nodejs";
export const dynamic = "force-dynamic";

export const POST = createSammyRouteHandler(getSammy(), {
  resolveUser: resolveUserFromSession,
});

Required: next.config

The runtime loads your generated .sammy/*.ts tools via tsx at request time, so they must stay external to the bundle.

javascript
// next.config.mjs
const nextConfig = {
  serverExternalPackages: ["sammy-sdk", "tsx"],
};
export default nextConfig;

Using auth middleware?

If middleware protects every route, /chat and /api/chat must be reachable or the UI/API will redirect to login. In dev, allowlist them and let resolveUser fall back to SAMMY_DEV_USER_ID. In production, require login for /chat and resolve the user from the session only.

Conversation History

Three more handlers give you a persistent thread list, resumable conversations, and delete — all scoped to the resolved user, so users never see each other's threads.

typescript
// app/api/chat/conversations/route.ts
import { createSammyListConversationsHandler } from "sammy-sdk";
import { getSammy, resolveUserFromSession } from "@/lib/sammy-server";

export const runtime = "nodejs";
export const dynamic = "force-dynamic";

export const GET = createSammyListConversationsHandler(getSammy(), {
  resolveUser: resolveUserFromSession,
});
typescript
// app/api/chat/conversations/[id]/route.ts
import {
  createSammyGetConversationHandler,
  createSammyDeleteConversationHandler,
} from "sammy-sdk";
import { getSammy, resolveUserFromSession } from "@/lib/sammy-server";

export const runtime = "nodejs";
export const dynamic = "force-dynamic";

const opts = { resolveUser: resolveUserFromSession };
export const GET = createSammyGetConversationHandler(getSammy(), opts);
export const DELETE = createSammyDeleteConversationHandler(getSammy(), opts);

HTTP contract

MethodPathPurpose
POST/api/chatSend a message (JSON or SSE stream)
GET/api/chat/conversationsList threads (?limit, ?before)
GET/api/chat/conversations/:idLoad one thread
DELETE/api/chat/conversations/:idDelete a thread

SSE events when streaming: text, tool_call, tool_result, agent_handoff, done, error.

Frontend Packages

Choose your level of abstraction — from a drop-in widget to a raw client for any framework.

sammy-sdk-ui — Full app (recommended)

SammyChatApp

A complete chat experience: history sidebar, New chat, resumable threads, and auto-scroll. This is what scaffolding drops into /chat.

tsx
import { SammyChatApp } from "sammy-sdk-ui";

export default function ChatPage() {
  return <SammyChatApp endpoint="/api/chat" title="My Assistant" height={560} />;
}

sammy-sdk-ui — Inline panel & floating widget

Embed a single conversation in an existing page, or float a support bubble in the corner.

tsx
import { SammyChatInline, SammyChat } from "sammy-sdk-ui";

// Inline panel inside a page
<SammyChatInline endpoint="/api/chat" title="Support" />

// Floating widget (corner bubble)
<SammyChat endpoint="/api/chat" position="bottom-right" />

sammy-sdk-react — Headless hooks

Bring your own UI. Hooks give you messages, streaming state, and the conversation list.

tsx
import { SammyProvider, useSammyChat, useSammyConversations } from "sammy-sdk-react";

function MyChat() {
  const { conversations, refresh } = useSammyConversations();
  const chat = useSammyChat({ onConversationId: () => refresh() });

  return (
    <div>
      <button onClick={() => chat.startNewConversation()}>New chat</button>
      {conversations.map((c) => (
        <button key={c.id} onClick={() => chat.selectConversation(c.id)}>{c.title}</button>
      ))}
      {/* render chat.messages; an input that calls chat.send(text) */}
    </div>
  );
}

export function App() {
  return (
    <SammyProvider endpoint="/api/chat">
      <MyChat />
    </SammyProvider>
  );
}

sammy-sdk-client — Any framework

A zero-dependency client for Vue, Svelte, vanilla JS, or the server.

typescript
import { SammyClient } from "sammy-sdk-client";

const client = new SammyClient({ endpoint: "/api/chat" });

const list = await client.listConversations();
const detail = await client.getConversation(list[0].id);
await client.deleteConversation(list[0].id);

const stream = client.stream("Show me unpaid invoices", { conversationId: detail.id });
stream.on("text", (chunk) => { /* update UI */ }).on("done", (final) => { /* settle */ });

Security & Auth

Sammy has security built in at every layer — from tool permissions to data handling.

Permission Levels

LevelDescriptionExample
readQuery data, no mutationsgetInvoices, getUser
writeCreate or update datacreateInvoice, sendEmail
adminDestructive or sensitive operationsrefundPayment, updateUserRole

Data Handling

Sammy Cloud never stores prompt or response content. Only metadata (token counts, latency, costs) is logged for billing and analytics.

BYOK mode bypasses Sammy Cloud entirely — your LLM calls go directly to Anthropic/OpenAI from your server. Zero data passes through our infrastructure.

Cached responses are encrypted at rest and auto-expire (default: 1 hour TTL).

Production

Conversation Storage

By default, conversations are kept in an in-memory store. That is perfect for local development, but it is not durable — history is wiped on restart and is not shared across multiple instances. For production, plug in your own store.

Heads up: on serverless or any multi-instance deploy, the default memory store means each worker has its own history and threads disappear between cold starts. Use Redis, Postgres, or a custom store.

The interface

ConversationStorage is a small interface; implement it and pass it to new Sammy(...). Every chat and history route then uses it transparently.

typescript
import type { ConversationStorage } from "sammy-sdk";

export class RedisConversationStorage implements ConversationStorage {
  async getOrCreate(id, userId) { /* ... */ }
  async append(id, message)     { /* ... */ }
  async get(id, userId)         { /* ... */ }
  async list(userId, opts)      { /* ... */ }
  async delete(id, userId)      { /* ... */ }
  async touch(id, patch)        { /* optional */ }
}

new Sammy({ conversationStorage: new RedisConversationStorage(/* ... */) });

What to use where

EnvironmentRecommendation
Local / sammy devDefault in-memory store
Single server / demosMemory OK (history resets on restart)
Production / serverless / >1 instanceRequired: Redis, Postgres, or custom
Production

Deployment

A Sammy-powered app deploys like any other Next.js app (Vercel, your own Node host, containers). Keep these production specifics in mind.

Use a real conversation store. Serverless platforms run many short-lived workers; the in-memory default won't share history. See .

Ship your .sammy/ directory. The runtime loads generated tools at request time via tsx. Commit .sammy/ or run sammy generate in CI, and keep serverExternalPackages: ["sammy-sdk", "tsx"].

Protect chat in production. Require auth for /chat and /api/chat, resolve the user from the session, and do not rely on SAMMY_DEV_USER_ID.

Keep secrets in a vault. SAMMY_API_KEY belongs in your platform's secret manager, never committed.

Before you ship

  • resolveUser wired to your real auth (not SAMMY_DEV_USER_ID)
  • ConversationStorage backed by Redis or a database
  • serverExternalPackages: ["sammy-sdk", "tsx"] in next.config
  • Middleware protects /chat and /api/chat
  • SAMMY_API_KEY stored in a secrets manager
  • sammy generate runs in CI when sammy.config.json changes
  • App-level rate limits / cost controls on LLM usage