{"id":"agents/ag2-extraction-notes","relativePath":"agents/ag2-extraction-notes.md","title":"AG2 Extraction Notes","markdown":"# AG2 Extraction Notes\n\nThis document preserves the Meta Museum-relevant AG2 patterns extracted from the\ntemporary `metagenauto/` reference repository before that folder was removed from\nthe app workspace.\n\n## Decision\n\nMeta Museum keeps the current TypeScript `AgentTask` contract layer as the public\nand testable boundary. AG2 remains a future execution engine behind that boundary,\nnot a replacement for the existing review-first API.\n\nCurrent production rule:\n\n- Agents draft, score, reconcile, and review.\n- Agents do not publish, merge identities, clear rights, deploy, or install tools\n  without human approval.\n- Every agent-facing result must preserve citations, rights warnings, refusal\n  paths, and review state.\n\n## Extracted AG2 Runtime Rules\n\n- Target runtime from the reference repo was AG2 `0.13.2`, imported as `autogen`.\n- Use a centralized `make_agent()` factory for AI-only agents.\n- AI-only agents use `human_input_mode=\"NEVER\"`.\n- Human approver agents use `human_input_mode=\"ALWAYS\"` and `llm_config=False`.\n- Executor-only agents use `llm_config=False`.\n- Every autonomous agent must have `max_consecutive_auto_reply`.\n- Use at least two termination controls: turn/round caps plus a termination\n  predicate or explicit terminate target.\n- Keep `code_execution_config=False` unless code execution is the explicit job.\n- Avoid exposing secrets through tool parameters or client-visible schemas.\n- Do not use cached LLM responses in eval gates or production decision paths.\n\n## Agent Factory Pattern\n\nThe useful reference design was:\n\n- resolve a named model tier (`cheap`, `mid`, `strong`, `hub`);\n- apply shared constructor defaults;\n- inject structured output schemas only when needed;\n- create either a `ConversableAgent` or `AssistantAgent`;\n- create executor and human-proxy agents through separate helper constructors.\n\nMeta Museum mapping:\n\n| AG2 idea | Meta Museum owner |\n|---|---|\n| `make_agent()` bounded constructor | future `services/agents/ag2-bridge` |\n| AI-only default | current agent profiles remain review-only |\n| executor-only proxy | future server-side tool executor |\n| human proxy | existing `approvalRequired: true` and future review queue |\n| structured response schema | existing Zod contracts and future Pydantic bridge |\n\n## Tool Registration Rules\n\nAG2 distinguishes caller and executor agents:\n\n- caller exposes a tool schema to the model;\n- executor runs the actual function.\n\nRules to retain:\n\n- Tool descriptions must be specific and auditable.\n- Tool return values should be normalized JSON or compact text records.\n- Secrets belong in server-side dependency contexts, not tool arguments.\n- One credential boundary should have one explicit context object.\n- Tools that mutate state must return a proposed diff or review packet before\n  any write.\n\nMuseum-specific first tools for an AG2 bridge:\n\n- `search_records`\n- `build_mapping_assist_plan`\n- `find_reconciliation_candidates`\n- `scan_rights_and_provenance`\n- `draft_curatorial_text`\n- `create_agent_task_review_packet`\n\n## Orchestration Patterns To Keep\n\nUse AG2 only where the pattern earns its complexity.\n\nPreferred:\n\n- Star / hub-and-spoke for cross-agent review, with one hub enforcing policy.\n- Pipeline for fixed ordered workflows such as mapping review then import review.\n- Triage-with-tasks for decomposing mixed user requests into typed specialist\n  tasks.\n- Redundant review for high-risk claims where two reviewers compare evidence.\n- Escalation for cheap deterministic checks before expensive reasoning.\n\nAvoid as default:\n\n- organic group chats for standards-critical workflows;\n- autonomous mutation/self-improvement against production code;\n- public A2A endpoints before auth, rate limits, and approval gates exist.\n\n## Human Approval Gates\n\nThe reference repo had seven useful gate classes. Meta Museum should adapt them\nto cultural-heritage workflows:\n\n| Gate | Meta Museum equivalent |\n|---|---|\n| `phase_transition` | roadmap/status claim changes |\n| `mutation_acceptance` | agent-authored code or config changes |\n| `production_deploy` | public deploy or release |\n| `external_tool_install` | new agent tool, MCP server, connector, or crawler |\n| `safeguard_policy_change` | rights, citation, sensitivity, or refusal policy |\n| `a2a_endpoint_exposure` | public or LAN-visible agent endpoint |\n| `story_proposal` | agent-authored backlog item or roadmap scope |\n\nAdditional museum gates to add before AG2 execution:\n\n- identity merge / `equivalent` write;\n- rights clearance or reuse status upgrade;\n- public Wiki/MediaWiki/Wikibase publish;\n- provenance claim rewrite;\n- sensitive-record disposition;\n- bulk import activation from an AI-generated mapping.\n\n## Eval Gate Pattern\n\nThe valuable pattern is not the old test files themselves; it is the discipline:\n\n1. Write an eval case before adding a new agent capability.\n2. Keep structural gates runnable without live LLM calls.\n3. Add behavioral gates for grounded outputs, refusals, and policy violations.\n4. Require all prior gates to remain green.\n5. Treat eval regressions as blockers, not warnings.\n\nMeta Museum already has the right local equivalent:\n\n- `tests/api/agents-run.test.ts`\n- AI eval gate scripts and `/api/ai-evals/summary`\n- closeout guard and README/roadmap currency rule\n- citation/refusal tests for AI chat/query/content flows\n\nNext AG2 bridge acceptance:\n\n- AG2 bridge can be disabled with no API contract change.\n- Same request returns the same `AgentTask` envelope shape in local and AG2 modes.\n- Every AG2 output validates against existing Zod contracts.\n- Timeout, refusal, tool error, and partial-output cases are tested.\n\n## Observability Pattern\n\nKeep the fault-tolerant tracing shape:\n\n- tracing setup must never crash request paths;\n- no endpoint means no-op tracing;\n- missing SDK means no-op tracing;\n- trace capture of LLM messages must be explicitly enabled;\n- do not export sensitive traces to third-party backends without approval.\n\nUseful span categories:\n\n- agent run;\n- tool call;\n- approval request;\n- LLM call;\n- citation retrieval;\n- rights/provenance scan;\n- refusal;\n- human approval outcome.\n\n## AG-UI / A2A Notes\n\nDo not make AG-UI or A2A the first bridge. Start with an internal server-side AG2\nadapter called by the existing Next route. Consider AG-UI or A2A only after:\n\n- persistent AgentTask history exists;\n- approval queue UI exists;\n- auth/rate limits are tested;\n- trace IDs connect the Next request and AG2 run;\n- shutdown/timeout behavior is proven.\n\nIf A2A is later exposed:\n\n- default to loopback only;\n- publish minimal agent cards;\n- require auth and rate limiting;\n- treat endpoint exposure as a human approval gate.\n\n## What Was Not Ported\n\nThe temporary repo included useful but non-essential systems that should not be\ncopied into Meta Museum now:\n\n- its Next.js command-hub UI;\n- its ticket system;\n- Slack/Discord/Telegram agents;\n- mutation/self-improvement engine;\n- Chroma-specific project memory;\n- Google Drive toolkit;\n- book quote library;\n- its roadmap and council personas.\n\nThose are reference ideas only. Meta Museum's domain agents are Clio, Mercator,\nJanus, Themis, and Calliope, and their job is museum-data review with evidence\nand human oversight.\n\n## Recommended Bridge Sequence\n\n1. Done: persist AgentTask run history in managed `agent-tasks.json` storage and\n   expose the editor-gated `/api/agents/tasks` review queue.\n2. Done: add an internal, disabled-by-default AG2 bridge boundary behind\n   `src/services/agents.ts` for Mercator and Janus, with timeout, contract\n   validation, and local fallback behavior.\n3. Done: implement the local Python AG2 worker endpoint in\n   `services/ag2-worker`, exposing `POST /ag2/agents/run` and `GET /health`.\n4. Done: validate AG2 output through the existing TypeScript bridge contract\n   plus Python service-level tests for Mercator, Janus, approval refusal, and\n   scope limits.\n5. Done: add route-to-bridge trace propagation plus timeout/refusal coverage\n   around the worker boundary.\n6. Done: add deployment/runtime docs for enabling\n   `METAMUSEUM_AG2_BRIDGE_ENABLED` in `docs/ops/ag2-worker.md` and\n   `docs/env.md`.\n7. Done: add live-worker eval artifacts and production sign-off policy through\n   `pnpm ag2:worker:eval` / `pnpm ag2:worker:eval:check`.\n8. Only then consider A2A/AG-UI or multi-agent group-chat execution.\n","sections":[{"level":2,"heading":"Decision","anchor":"decision"},{"level":2,"heading":"Extracted AG2 Runtime Rules","anchor":"extracted-ag2-runtime-rules"},{"level":2,"heading":"Agent Factory Pattern","anchor":"agent-factory-pattern"},{"level":2,"heading":"Tool Registration Rules","anchor":"tool-registration-rules"},{"level":2,"heading":"Orchestration Patterns To Keep","anchor":"orchestration-patterns-to-keep"},{"level":2,"heading":"Human Approval Gates","anchor":"human-approval-gates"},{"level":2,"heading":"Eval Gate Pattern","anchor":"eval-gate-pattern"},{"level":2,"heading":"Observability Pattern","anchor":"observability-pattern"},{"level":2,"heading":"AG-UI / A2A Notes","anchor":"ag-ui-a2a-notes"},{"level":2,"heading":"What Was Not Ported","anchor":"what-was-not-ported"},{"level":2,"heading":"Recommended Bridge Sequence","anchor":"recommended-bridge-sequence"}],"html":"<h1 id=\"ag2-extraction-notes\">AG2 Extraction Notes</h1>\n<p>This document preserves the Meta Museum-relevant AG2 patterns extracted from the</p>\n<p>temporary `metagenauto/` reference repository before that folder was removed from</p>\n<p>the app workspace.</p>\n<h2 id=\"decision\">Decision</h2>\n<p>Meta Museum keeps the current TypeScript `AgentTask` contract layer as the public</p>\n<p>and testable boundary. AG2 remains a future execution engine behind that boundary,</p>\n<p>not a replacement for the existing review-first API.</p>\n<p>Current production rule:</p>\n<p>  without human approval.</p>\n<p>  paths, and review state.</p>\n<ul><li>Agents draft, score, reconcile, and review.</li><li>Agents do not publish, merge identities, clear rights, deploy, or install tools</li><li>Every agent-facing result must preserve citations, rights warnings, refusal</li></ul>\n<h2 id=\"extracted-ag2-runtime-rules\">Extracted AG2 Runtime Rules</h2>\n<p>  predicate or explicit terminate target.</p>\n<ul><li>Target runtime from the reference repo was AG2 `0.13.2`, imported as `autogen`.</li><li>Use a centralized `make_agent()` factory for AI-only agents.</li><li>AI-only agents use `human_input_mode=&quot;NEVER&quot;`.</li><li>Human approver agents use `human_input_mode=&quot;ALWAYS&quot;` and `llm_config=False`.</li><li>Executor-only agents use `llm_config=False`.</li><li>Every autonomous agent must have `max_consecutive_auto_reply`.</li><li>Use at least two termination controls: turn/round caps plus a termination</li><li>Keep `code_execution_config=False` unless code execution is the explicit job.</li><li>Avoid exposing secrets through tool parameters or client-visible schemas.</li><li>Do not use cached LLM responses in eval gates or production decision paths.</li></ul>\n<h2 id=\"agent-factory-pattern\">Agent Factory Pattern</h2>\n<p>The useful reference design was:</p>\n<ul><li>resolve a named model tier (`cheap`, `mid`, `strong`, `hub`);</li><li>apply shared constructor defaults;</li><li>inject structured output schemas only when needed;</li><li>create either a `ConversableAgent` or `AssistantAgent`;</li><li>create executor and human-proxy agents through separate helper constructors.</li></ul>\n<p>Meta Museum mapping:</p>\n<p>| AG2 idea | Meta Museum owner |</p>\n<p>|---|---|</p>\n<p>| `make_agent()` bounded constructor | future `services/agents/ag2-bridge` |</p>\n<p>| AI-only default | current agent profiles remain review-only |</p>\n<p>| executor-only proxy | future server-side tool executor |</p>\n<p>| human proxy | existing `approvalRequired: true` and future review queue |</p>\n<p>| structured response schema | existing Zod contracts and future Pydantic bridge |</p>\n<h2 id=\"tool-registration-rules\">Tool Registration Rules</h2>\n<p>AG2 distinguishes caller and executor agents:</p>\n<ul><li>caller exposes a tool schema to the model;</li><li>executor runs the actual function.</li></ul>\n<p>Rules to retain:</p>\n<p>  any write.</p>\n<ul><li>Tool descriptions must be specific and auditable.</li><li>Tool return values should be normalized JSON or compact text records.</li><li>Secrets belong in server-side dependency contexts, not tool arguments.</li><li>One credential boundary should have one explicit context object.</li><li>Tools that mutate state must return a proposed diff or review packet before</li></ul>\n<p>Museum-specific first tools for an AG2 bridge:</p>\n<ul><li>`search_records`</li><li>`build_mapping_assist_plan`</li><li>`find_reconciliation_candidates`</li><li>`scan_rights_and_provenance`</li><li>`draft_curatorial_text`</li><li>`create_agent_task_review_packet`</li></ul>\n<h2 id=\"orchestration-patterns-to-keep\">Orchestration Patterns To Keep</h2>\n<p>Use AG2 only where the pattern earns its complexity.</p>\n<p>Preferred:</p>\n<p>  tasks.</p>\n<ul><li>Star / hub-and-spoke for cross-agent review, with one hub enforcing policy.</li><li>Pipeline for fixed ordered workflows such as mapping review then import review.</li><li>Triage-with-tasks for decomposing mixed user requests into typed specialist</li><li>Redundant review for high-risk claims where two reviewers compare evidence.</li><li>Escalation for cheap deterministic checks before expensive reasoning.</li></ul>\n<p>Avoid as default:</p>\n<ul><li>organic group chats for standards-critical workflows;</li><li>autonomous mutation/self-improvement against production code;</li><li>public A2A endpoints before auth, rate limits, and approval gates exist.</li></ul>\n<h2 id=\"human-approval-gates\">Human Approval Gates</h2>\n<p>The reference repo had seven useful gate classes. Meta Museum should adapt them</p>\n<p>to cultural-heritage workflows:</p>\n<p>| Gate | Meta Museum equivalent |</p>\n<p>|---|---|</p>\n<p>| `phase_transition` | roadmap/status claim changes |</p>\n<p>| `mutation_acceptance` | agent-authored code or config changes |</p>\n<p>| `production_deploy` | public deploy or release |</p>\n<p>| `external_tool_install` | new agent tool, MCP server, connector, or crawler |</p>\n<p>| `safeguard_policy_change` | rights, citation, sensitivity, or refusal policy |</p>\n<p>| `a2a_endpoint_exposure` | public or LAN-visible agent endpoint |</p>\n<p>| `story_proposal` | agent-authored backlog item or roadmap scope |</p>\n<p>Additional museum gates to add before AG2 execution:</p>\n<ul><li>identity merge / `equivalent` write;</li><li>rights clearance or reuse status upgrade;</li><li>public Wiki/MediaWiki/Wikibase publish;</li><li>provenance claim rewrite;</li><li>sensitive-record disposition;</li><li>bulk import activation from an AI-generated mapping.</li></ul>\n<h2 id=\"eval-gate-pattern\">Eval Gate Pattern</h2>\n<p>The valuable pattern is not the old test files themselves; it is the discipline:</p>\n<ol><li>Write an eval case before adding a new agent capability.</li></ol>\n<ol><li>Keep structural gates runnable without live LLM calls.</li></ol>\n<ol><li>Add behavioral gates for grounded outputs, refusals, and policy violations.</li></ol>\n<ol><li>Require all prior gates to remain green.</li></ol>\n<ol><li>Treat eval regressions as blockers, not warnings.</li></ol>\n<p>Meta Museum already has the right local equivalent:</p>\n<ul><li>`tests/api/agents-run.test.ts`</li><li>AI eval gate scripts and `/api/ai-evals/summary`</li><li>closeout guard and README/roadmap currency rule</li><li>citation/refusal tests for AI chat/query/content flows</li></ul>\n<p>Next AG2 bridge acceptance:</p>\n<ul><li>AG2 bridge can be disabled with no API contract change.</li><li>Same request returns the same `AgentTask` envelope shape in local and AG2 modes.</li><li>Every AG2 output validates against existing Zod contracts.</li><li>Timeout, refusal, tool error, and partial-output cases are tested.</li></ul>\n<h2 id=\"observability-pattern\">Observability Pattern</h2>\n<p>Keep the fault-tolerant tracing shape:</p>\n<ul><li>tracing setup must never crash request paths;</li><li>no endpoint means no-op tracing;</li><li>missing SDK means no-op tracing;</li><li>trace capture of LLM messages must be explicitly enabled;</li><li>do not export sensitive traces to third-party backends without approval.</li></ul>\n<p>Useful span categories:</p>\n<ul><li>agent run;</li><li>tool call;</li><li>approval request;</li><li>LLM call;</li><li>citation retrieval;</li><li>rights/provenance scan;</li><li>refusal;</li><li>human approval outcome.</li></ul>\n<h2 id=\"ag-ui-a2a-notes\">AG-UI / A2A Notes</h2>\n<p>Do not make AG-UI or A2A the first bridge. Start with an internal server-side AG2</p>\n<p>adapter called by the existing Next route. Consider AG-UI or A2A only after:</p>\n<ul><li>persistent AgentTask history exists;</li><li>approval queue UI exists;</li><li>auth/rate limits are tested;</li><li>trace IDs connect the Next request and AG2 run;</li><li>shutdown/timeout behavior is proven.</li></ul>\n<p>If A2A is later exposed:</p>\n<ul><li>default to loopback only;</li><li>publish minimal agent cards;</li><li>require auth and rate limiting;</li><li>treat endpoint exposure as a human approval gate.</li></ul>\n<h2 id=\"what-was-not-ported\">What Was Not Ported</h2>\n<p>The temporary repo included useful but non-essential systems that should not be</p>\n<p>copied into Meta Museum now:</p>\n<ul><li>its Next.js command-hub UI;</li><li>its ticket system;</li><li>Slack/Discord/Telegram agents;</li><li>mutation/self-improvement engine;</li><li>Chroma-specific project memory;</li><li>Google Drive toolkit;</li><li>book quote library;</li><li>its roadmap and council personas.</li></ul>\n<p>Those are reference ideas only. Meta Museum&#39;s domain agents are Clio, Mercator,</p>\n<p>Janus, Themis, and Calliope, and their job is museum-data review with evidence</p>\n<p>and human oversight.</p>\n<h2 id=\"recommended-bridge-sequence\">Recommended Bridge Sequence</h2>\n<ol><li>Done: persist AgentTask run history in managed `agent-tasks.json` storage and</li></ol>\n<p>   expose the editor-gated `/api/agents/tasks` review queue.</p>\n<ol><li>Done: add an internal, disabled-by-default AG2 bridge boundary behind</li></ol>\n<p>   `src/services/agents.ts` for Mercator and Janus, with timeout, contract</p>\n<p>   validation, and local fallback behavior.</p>\n<ol><li>Done: implement the local Python AG2 worker endpoint in</li></ol>\n<p>   `services/ag2-worker`, exposing `POST /ag2/agents/run` and `GET /health`.</p>\n<ol><li>Done: validate AG2 output through the existing TypeScript bridge contract</li></ol>\n<p>   plus Python service-level tests for Mercator, Janus, approval refusal, and</p>\n<p>   scope limits.</p>\n<ol><li>Done: add route-to-bridge trace propagation plus timeout/refusal coverage</li></ol>\n<p>   around the worker boundary.</p>\n<ol><li>Done: add deployment/runtime docs for enabling</li></ol>\n<p>   `METAMUSEUM_AG2_BRIDGE_ENABLED` in `docs/ops/ag2-worker.md` and</p>\n<p>   `docs/env.md`.</p>\n<ol><li>Done: add live-worker eval artifacts and production sign-off policy through</li></ol>\n<p>   `pnpm ag2:worker:eval` / `pnpm ag2:worker:eval:check`.</p>\n<ol><li>Only then consider A2A/AG-UI or multi-agent group-chat execution.</li></ol>","updatedAt":"2018-10-20T01:46:40.000Z","checksum":"e8d0072ebec1bfeda11c08aaedcf7c0f164d0e0b41100f10c5167ba0407f55b4","checksumPrefix":"e8d0072ebec1","anchorCount":11,"lineCount":224,"rawUrl":"/api/docs/content?path=agents%2Fag2-extraction-notes.md","htmlUrl":"/docs?doc=agents%2Fag2-extraction-notes.md","apiUrl":"/api/docs/content?path=agents%2Fag2-extraction-notes.md"}