AG2 Extraction Notes
This document preserves the Meta Museum-relevant AG2 patterns extracted from the
temporary `metagenauto/` reference repository before that folder was removed from
the app workspace.
Decision
Meta Museum keeps the current TypeScript `AgentTask` contract layer as the public
and testable boundary. AG2 remains a future execution engine behind that boundary,
not a replacement for the existing review-first API.
Current production rule:
without human approval.
paths, and review state.
- Agents draft, score, reconcile, and review.
- Agents do not publish, merge identities, clear rights, deploy, or install tools
- Every agent-facing result must preserve citations, rights warnings, refusal
Extracted AG2 Runtime Rules
predicate or explicit terminate target.
- Target runtime from the reference repo was AG2 `0.13.2`, imported as `autogen`.
- Use a centralized `make_agent()` factory for AI-only agents.
- AI-only agents use `human_input_mode="NEVER"`.
- Human approver agents use `human_input_mode="ALWAYS"` and `llm_config=False`.
- Executor-only agents use `llm_config=False`.
- Every autonomous agent must have `max_consecutive_auto_reply`.
- Use at least two termination controls: turn/round caps plus a termination
- Keep `code_execution_config=False` unless code execution is the explicit job.
- Avoid exposing secrets through tool parameters or client-visible schemas.
- Do not use cached LLM responses in eval gates or production decision paths.
Agent Factory Pattern
The useful reference design was:
- resolve a named model tier (`cheap`, `mid`, `strong`, `hub`);
- apply shared constructor defaults;
- inject structured output schemas only when needed;
- create either a `ConversableAgent` or `AssistantAgent`;
- create executor and human-proxy agents through separate helper constructors.
Meta Museum mapping:
| AG2 idea | Meta Museum owner |
|---|---|
| `make_agent()` bounded constructor | future `services/agents/ag2-bridge` |
| AI-only default | current agent profiles remain review-only |
| executor-only proxy | future server-side tool executor |
| human proxy | existing `approvalRequired: true` and future review queue |
| structured response schema | existing Zod contracts and future Pydantic bridge |
Tool Registration Rules
AG2 distinguishes caller and executor agents:
- caller exposes a tool schema to the model;
- executor runs the actual function.
Rules to retain:
any write.
- Tool descriptions must be specific and auditable.
- Tool return values should be normalized JSON or compact text records.
- Secrets belong in server-side dependency contexts, not tool arguments.
- One credential boundary should have one explicit context object.
- Tools that mutate state must return a proposed diff or review packet before
Museum-specific first tools for an AG2 bridge:
- `search_records`
- `build_mapping_assist_plan`
- `find_reconciliation_candidates`
- `scan_rights_and_provenance`
- `draft_curatorial_text`
- `create_agent_task_review_packet`
Orchestration Patterns To Keep
Use AG2 only where the pattern earns its complexity.
Preferred:
tasks.
- Star / hub-and-spoke for cross-agent review, with one hub enforcing policy.
- Pipeline for fixed ordered workflows such as mapping review then import review.
- Triage-with-tasks for decomposing mixed user requests into typed specialist
- Redundant review for high-risk claims where two reviewers compare evidence.
- Escalation for cheap deterministic checks before expensive reasoning.
Avoid as default:
- organic group chats for standards-critical workflows;
- autonomous mutation/self-improvement against production code;
- public A2A endpoints before auth, rate limits, and approval gates exist.
Human Approval Gates
The reference repo had seven useful gate classes. Meta Museum should adapt them
to cultural-heritage workflows:
| Gate | Meta Museum equivalent |
|---|---|
| `phase_transition` | roadmap/status claim changes |
| `mutation_acceptance` | agent-authored code or config changes |
| `production_deploy` | public deploy or release |
| `external_tool_install` | new agent tool, MCP server, connector, or crawler |
| `safeguard_policy_change` | rights, citation, sensitivity, or refusal policy |
| `a2a_endpoint_exposure` | public or LAN-visible agent endpoint |
| `story_proposal` | agent-authored backlog item or roadmap scope |
Additional museum gates to add before AG2 execution:
- identity merge / `equivalent` write;
- rights clearance or reuse status upgrade;
- public Wiki/MediaWiki/Wikibase publish;
- provenance claim rewrite;
- sensitive-record disposition;
- bulk import activation from an AI-generated mapping.
Eval Gate Pattern
The valuable pattern is not the old test files themselves; it is the discipline:
- Write an eval case before adding a new agent capability.
- Keep structural gates runnable without live LLM calls.
- Add behavioral gates for grounded outputs, refusals, and policy violations.
- Require all prior gates to remain green.
- Treat eval regressions as blockers, not warnings.
Meta Museum already has the right local equivalent:
- `tests/api/agents-run.test.ts`
- AI eval gate scripts and `/api/ai-evals/summary`
- closeout guard and README/roadmap currency rule
- citation/refusal tests for AI chat/query/content flows
Next AG2 bridge acceptance:
- AG2 bridge can be disabled with no API contract change.
- Same request returns the same `AgentTask` envelope shape in local and AG2 modes.
- Every AG2 output validates against existing Zod contracts.
- Timeout, refusal, tool error, and partial-output cases are tested.
Observability Pattern
Keep the fault-tolerant tracing shape:
- tracing setup must never crash request paths;
- no endpoint means no-op tracing;
- missing SDK means no-op tracing;
- trace capture of LLM messages must be explicitly enabled;
- do not export sensitive traces to third-party backends without approval.
Useful span categories:
- agent run;
- tool call;
- approval request;
- LLM call;
- citation retrieval;
- rights/provenance scan;
- refusal;
- human approval outcome.
AG-UI / A2A Notes
Do not make AG-UI or A2A the first bridge. Start with an internal server-side AG2
adapter called by the existing Next route. Consider AG-UI or A2A only after:
- persistent AgentTask history exists;
- approval queue UI exists;
- auth/rate limits are tested;
- trace IDs connect the Next request and AG2 run;
- shutdown/timeout behavior is proven.
If A2A is later exposed:
- default to loopback only;
- publish minimal agent cards;
- require auth and rate limiting;
- treat endpoint exposure as a human approval gate.
What Was Not Ported
The temporary repo included useful but non-essential systems that should not be
copied into Meta Museum now:
- its Next.js command-hub UI;
- its ticket system;
- Slack/Discord/Telegram agents;
- mutation/self-improvement engine;
- Chroma-specific project memory;
- Google Drive toolkit;
- book quote library;
- its roadmap and council personas.
Those are reference ideas only. Meta Museum's domain agents are Clio, Mercator,
Janus, Themis, and Calliope, and their job is museum-data review with evidence
and human oversight.
Recommended Bridge Sequence
- Done: persist AgentTask run history in managed `agent-tasks.json` storage and
expose the editor-gated `/api/agents/tasks` review queue.
- Done: add an internal, disabled-by-default AG2 bridge boundary behind
`src/services/agents.ts` for Mercator and Janus, with timeout, contract
validation, and local fallback behavior.
- Done: implement the local Python AG2 worker endpoint in
`services/ag2-worker`, exposing `POST /ag2/agents/run` and `GET /health`.
- Done: validate AG2 output through the existing TypeScript bridge contract
plus Python service-level tests for Mercator, Janus, approval refusal, and
scope limits.
- Done: add route-to-bridge trace propagation plus timeout/refusal coverage
around the worker boundary.
- Done: add deployment/runtime docs for enabling
`METAMUSEUM_AG2_BRIDGE_ENABLED` in `docs/ops/ag2-worker.md` and
`docs/env.md`.
- Done: add live-worker eval artifacts and production sign-off policy through
`pnpm ag2:worker:eval` / `pnpm ag2:worker:eval:check`.
- Only then consider A2A/AG-UI or multi-agent group-chat execution.