Syntor
A high-performance multi-agent AI orchestration system built in Go with secure tool execution and enterprise-grade observability
Tech Stack
Table of Contents
Problem Statement
Modern AI coding assistants like Claude Code and GitHub Copilot demonstrate the power of tool-enabled AI—where models can read files, execute commands, and make real changes. However, these tools are locked into vendor ecosystems, require cloud connectivity, and offer limited customization for specialized workflows.
Organizations and developers face several challenges: proprietary AI tools lack transparency in how they operate, cloud-dependent solutions fail in air-gapped or privacy-sensitive environments, and monolithic AI assistants struggle with complex tasks that require specialized domain knowledge. Furthermore, there’s no standard way to coordinate multiple AI agents with different capabilities to tackle multi-faceted problems.
Syntor addresses these gaps by providing a local-first, multi-agent AI orchestration platform that brings Claude Code-like capabilities to any environment while enabling sophisticated agent coordination patterns.
Solution Architecture
Syntor is a multi-layered system that separates concerns between user interaction, AI inference, tool execution, and agent coordination. The architecture supports both standalone CLI usage and distributed deployment with full observability.
Architecture Diagram
graph TB
subgraph "User Interface Layer"
TUI[Bubbletea TUI]
CLI[Cobra CLI]
REPL[Simple REPL]
end
subgraph "Inference Abstraction Layer"
REG[Provider Registry]
OLL[Ollama Provider]
ANT[Anthropic Provider]
DSK[DeepSeek Provider]
end
subgraph "Agent Layer"
SNTR[SNTR Orchestrator]
DOC[Documentation Agent]
GIT[Git Agent]
CODE[Code Worker]
WORK[General Worker]
end
subgraph "Core Systems"
TOOLS[Tool Executor]
SEC[Security Manager]
MAN[Manifest Store]
PROMPT[Prompt Builder]
COORD[Coordination Engine]
end
subgraph "Tool Implementations"
RF[read_file]
WF[write_file]
EF[edit_file]
BASH[bash]
GLOB[glob]
GREP[grep]
LS[list_directory]
end
subgraph "Infrastructure Layer"
KAFKA[Kafka Message Bus]
REDIS[Redis Context Store]
PROM[Prometheus Metrics]
JAEG[Jaeger Tracing]
end
TUI --> REG
CLI --> REG
REPL --> REG
REG --> OLL
REG --> ANT
REG --> DSK
SNTR --> TOOLS
SNTR --> COORD
DOC --> TOOLS
GIT --> TOOLS
CODE --> TOOLS
WORK --> TOOLS
MAN --> SNTR
MAN --> DOC
MAN --> GIT
MAN --> CODE
MAN --> WORK
PROMPT --> SNTR
PROMPT --> DOC
TOOLS --> SEC
SEC --> RF
SEC --> WF
SEC --> EF
SEC --> BASH
SEC --> GLOB
SEC --> GREP
SEC --> LS
COORD --> KAFKA
SNTR --> REDIS
TOOLS --> PROM
COORD --> JAEG
Key Components
-
SNTR Orchestrator: The primary coordinator agent that understands user intent, executes tools directly, and delegates specialized tasks to other agents via structured handoff protocols.
-
Provider Registry: Unified abstraction layer supporting multiple AI backends (Ollama for local inference, Anthropic Claude for cloud, DeepSeek for cost-effective alternatives) with per-agent model assignment.
-
Tool Executor: Secure execution engine for 7 tool implementations with concurrency limiting (max 25 iterations), path validation, and command allowlisting.
-
Manifest Store: YAML-based agent configuration system with fsnotify hot-reload, enabling runtime updates without service restarts.
-
Coordination Engine: Manages agent-to-agent handoffs, execution plans with approval workflows, and async task distribution via Kafka.
Technical Implementation
Multi-Agent Coordination Protocol
Agents communicate through a structured JSON protocol that supports both direct delegation and multi-step planning:
// Handoff Intent - for direct delegation
type HandoffIntent struct {
Action string `json:"action"` // "delegate"
Target string `json:"target"` // target agent
Task string `json:"task"` // task description
Context map[string]any `json:"context"` // shared context
Priority string `json:"priority"` // urgency level
}
// Execution Plan - for complex workflows
type ExecutionPlan struct {
Action string `json:"action"` // "plan"
Summary string `json:"summary"` // plan overview
Steps []PlanStep `json:"steps"` // ordered steps
RequiresApproval bool `json:"requiresApproval"` // Plan Mode trigger
}
The manifest system defines each agent’s capabilities, allowed handoff targets, and tool bindings:
# configs/agents/sntr.yaml
apiVersion: syntor/v1
kind: AgentManifest
metadata:
name: sntr
description: Primary orchestrator with full tool access
spec:
model: mistral:7b
capabilities:
- code-analysis
- file-operations
- task-coordination
tools:
- read_file
- write_file
- edit_file
- bash
- glob
- grep
handoffs:
allowedTargets:
- documentation
- git
- worker
protocol: structured
Secure Tool Execution
The tool system implements defense-in-depth security controls:
// Security Manager enforces policies
type SecurityManager struct {
allowedPaths []string
blockedPaths []string
allowedCommands []string
maxIterations int // Default: 25
}
// Path validation prevents traversal attacks
func (s *SecurityManager) ValidatePath(path string) error {
absPath, _ := filepath.Abs(path)
for _, blocked := range s.blockedPaths {
if strings.HasPrefix(absPath, blocked) {
return ErrPathBlocked
}
}
for _, allowed := range s.allowedPaths {
if strings.HasPrefix(absPath, allowed) {
return nil
}
}
return ErrPathNotAllowed
}
Key security controls:
- Path validation: Prevents directory traversal, blocks sensitive paths (
/etc,~/.ssh) - Command allowlisting: Bash tool only executes pre-approved commands
- Iteration limits: Maximum 25 tool calls per request prevents infinite loops
- Plan Mode: High-risk operations require explicit user approval via Ctrl+Y/N
- Risk classification: Tools categorized by risk level (read=low, write=medium, bash=high)
AI Provider Abstraction
The inference layer provides a unified interface across providers:
type Provider interface {
Name() string
Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)
ListModels(ctx context.Context) ([]Model, error)
IsAvailable(ctx context.Context) bool
}
// Registry manages multiple providers with model routing
type Registry struct {
providers map[string]Provider
modelMapping map[string]string // agent -> model
fallbacks map[string]string // model -> fallback
}
func (r *Registry) GetProviderForModel(model string) (Provider, error) {
// Routes to correct provider based on model prefix
// e.g., "claude-3" -> Anthropic, "llama3.2" -> Ollama
}
This enables:
- Local-first operation: Works offline with Ollama models
- Graceful degradation: Falls back to alternative models/providers
- Per-agent optimization: Assign specialized models to specific agents
- Auto-pull: Automatically downloads missing Ollama models
Enterprise Observability
Full observability stack integrated throughout:
// Prometheus metrics for tool execution
var (
toolExecutions = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "syntor_tool_executions_total",
Help: "Total tool executions by tool and status",
},
[]string{"tool", "status"},
)
toolLatency = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "syntor_tool_latency_seconds",
Help: "Tool execution latency",
Buckets: prometheus.DefBuckets,
},
[]string{"tool"},
)
)
// Jaeger tracing for distributed operations
func (e *Executor) ExecuteTool(ctx context.Context, call ToolCall) (*ToolResult, error) {
span, ctx := opentracing.StartSpanFromContext(ctx, "tool.execute")
defer span.Finish()
span.SetTag("tool.name", call.Name)
span.SetTag("tool.id", call.ID)
// ...
}
Infrastructure includes:
- Prometheus: Metrics on tool execution, inference latency, agent activity
- Grafana: Pre-built dashboards for system monitoring
- Jaeger: Distributed tracing across agent handoffs
- Structured logging: Uber Zap with correlation IDs
Resilience Patterns
Production-ready fault tolerance:
// Circuit breaker for provider failures
type CircuitBreaker struct {
failures int
threshold int
state State // Closed, Open, HalfOpen
lastFailure time.Time
timeout time.Duration
}
// Retry with exponential backoff
type RetryConfig struct {
MaxAttempts int
InitialWait time.Duration
MaxWait time.Duration
Multiplier float64
}
// Rate limiter for API providers
type RateLimiter struct {
limiter *rate.Limiter
burst int
}
Challenges & Solutions
Challenge 1: Hot-Reload Without Race Conditions
Problem: Agent manifests needed runtime updates without service restarts, but concurrent access during reload could cause inconsistent reads or panics.
Solution: Implemented a copy-on-write pattern with RWMutex protection. The ManifestStore maintains an immutable snapshot that readers access lock-free, while writes create a new snapshot under exclusive lock:
func (s *ManifestStore) Reload() error {
s.mu.Lock()
defer s.mu.Unlock()
newManifests := make(map[string]*AgentManifest)
// Load all manifests into new map
s.manifests = newManifests // Atomic swap
return nil
}
Lesson Learned: Copy-on-write patterns trade memory for simplicity and are often the right choice for configuration systems where reads vastly outnumber writes.
Challenge 2: LLM Output Parsing Reliability
Problem: Language models produce inconsistent JSON—sometimes wrapped in markdown code blocks, sometimes with trailing commas, sometimes with explanatory text mixed in.
Solution: Built a robust multi-stage parser that handles common variations:
func ParseToolCalls(output string) ([]ToolCall, error) {
// Stage 1: Extract JSON from markdown blocks
output = extractFromCodeBlock(output)
// Stage 2: Try strict JSON parse
if calls, err := strictParse(output); err == nil {
return calls, nil
}
// Stage 3: Lenient parse with fixes
output = fixTrailingCommas(output)
output = fixUnquotedKeys(output)
return lenientParse(output)
}
Lesson Learned: When integrating with LLMs, assume the worst about output format consistency. Build parsers that degrade gracefully rather than fail on edge cases.
Challenge 3: TUI State Management Complexity
Problem: Bubbletea’s Elm-architecture requires careful state management. With streaming responses, mode toggles, and approval workflows, state transitions became error-prone.
Solution: Centralized state into a well-defined model with explicit state machine transitions:
type Model struct {
state AppState // Idle, Streaming, AwaitingApproval, Error
mode Mode // Auto, Plan
pendingPlan *ExecutionPlan
// ...
}
func (m Model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
switch m.state {
case StateAwaitingApproval:
return m.handleApprovalInput(msg)
case StateStreaming:
return m.handleStreamingUpdate(msg)
// ...
}
}
Lesson Learned: Explicit state machines make complex UI logic tractable. Name your states and transitions clearly—it’s documentation that the compiler enforces.
Results & Metrics
Project Scale
- ~34,000 lines of Go code across well-organized packages
- 17 test files covering integration, property-based, and unit tests
- 5 specialized agents with distinct capabilities
- 7 secure tool implementations
- 3 AI provider integrations
- 8 completed development phases
Architecture Quality
- Clean separation between inference, tools, coordination, and UI layers
- Hot-reload capability enables zero-downtime configuration updates
- Full observability stack (metrics, tracing, logging) from day one
- Resilience patterns (circuit breaker, retry, rate limiting) built-in
Flexibility
- Works offline with local Ollama models
- Scales to cloud providers (Anthropic, DeepSeek) without code changes
- Per-agent model assignment optimizes cost/capability tradeoffs
- YAML manifests enable non-developer customization
Key Takeaways
-
Abstraction layers pay dividends: The provider registry made adding new AI backends trivial. What started as Ollama-only became multi-provider with minimal changes because the abstraction was right.
-
Security must be foundational: Bolt-on security never works. Building path validation, command allowlisting, and approval workflows from the start shaped the architecture positively.
-
Observability is not optional: Having Prometheus metrics and Jaeger tracing from the beginning caught bugs that would have been invisible in production. The investment pays for itself quickly.
-
Hot-reload changes development velocity: Being able to tweak agent prompts and capabilities without restart cycles accelerated iteration dramatically during prompt engineering.
-
Elm architecture scales: Bubbletea’s update/view pattern handled complex TUI state better than expected. The discipline of explicit state machines prevented entire categories of bugs.
Future Enhancements
- Persistent conversation memory: Long-term context storage with semantic search
- Custom tool plugins: User-defined tools via WASM or gRPC
- Web UI option: Browser-based interface alongside TUI
- Agent marketplace: Share and discover community agent manifests
- Cloud deployment templates: ECS/EKS configurations for team deployments
- MCP integration: Model Context Protocol support for broader tool ecosystem
Resources
- Repository: github.com/hmbldv/syntor
- Tech Stack Docs:
- Bubbletea - TUI framework
- Ollama - Local LLM inference
- Anthropic API - Claude integration
This project demonstrates proficiency in: Go systems programming, multi-agent AI architecture, secure tool execution design, distributed systems patterns, terminal UI development, and enterprise observability implementation.