Syntor

A high-performance multi-agent AI orchestration system built in Go with secure tool execution and enterprise-grade observability

AI & Automation
Status: in development
Started:

Tech Stack

GoBubbleteaCobraKafkaRedisPrometheusJaegerOllamaAnthropicDeepSeek
Table of Contents

    Problem Statement

    Modern AI coding assistants like Claude Code and GitHub Copilot demonstrate the power of tool-enabled AI—where models can read files, execute commands, and make real changes. However, these tools are locked into vendor ecosystems, require cloud connectivity, and offer limited customization for specialized workflows.

    Organizations and developers face several challenges: proprietary AI tools lack transparency in how they operate, cloud-dependent solutions fail in air-gapped or privacy-sensitive environments, and monolithic AI assistants struggle with complex tasks that require specialized domain knowledge. Furthermore, there’s no standard way to coordinate multiple AI agents with different capabilities to tackle multi-faceted problems.

    Syntor addresses these gaps by providing a local-first, multi-agent AI orchestration platform that brings Claude Code-like capabilities to any environment while enabling sophisticated agent coordination patterns.

    Solution Architecture

    Syntor is a multi-layered system that separates concerns between user interaction, AI inference, tool execution, and agent coordination. The architecture supports both standalone CLI usage and distributed deployment with full observability.

    Architecture Diagram

    graph TB
        subgraph "User Interface Layer"
            TUI[Bubbletea TUI]
            CLI[Cobra CLI]
            REPL[Simple REPL]
        end
    
        subgraph "Inference Abstraction Layer"
            REG[Provider Registry]
            OLL[Ollama Provider]
            ANT[Anthropic Provider]
            DSK[DeepSeek Provider]
        end
    
        subgraph "Agent Layer"
            SNTR[SNTR Orchestrator]
            DOC[Documentation Agent]
            GIT[Git Agent]
            CODE[Code Worker]
            WORK[General Worker]
        end
    
        subgraph "Core Systems"
            TOOLS[Tool Executor]
            SEC[Security Manager]
            MAN[Manifest Store]
            PROMPT[Prompt Builder]
            COORD[Coordination Engine]
        end
    
        subgraph "Tool Implementations"
            RF[read_file]
            WF[write_file]
            EF[edit_file]
            BASH[bash]
            GLOB[glob]
            GREP[grep]
            LS[list_directory]
        end
    
        subgraph "Infrastructure Layer"
            KAFKA[Kafka Message Bus]
            REDIS[Redis Context Store]
            PROM[Prometheus Metrics]
            JAEG[Jaeger Tracing]
        end
    
        TUI --> REG
        CLI --> REG
        REPL --> REG
    
        REG --> OLL
        REG --> ANT
        REG --> DSK
    
        SNTR --> TOOLS
        SNTR --> COORD
        DOC --> TOOLS
        GIT --> TOOLS
        CODE --> TOOLS
        WORK --> TOOLS
    
        MAN --> SNTR
        MAN --> DOC
        MAN --> GIT
        MAN --> CODE
        MAN --> WORK
    
        PROMPT --> SNTR
        PROMPT --> DOC
    
        TOOLS --> SEC
        SEC --> RF
        SEC --> WF
        SEC --> EF
        SEC --> BASH
        SEC --> GLOB
        SEC --> GREP
        SEC --> LS
    
        COORD --> KAFKA
        SNTR --> REDIS
        TOOLS --> PROM
        COORD --> JAEG

    Key Components

    1. SNTR Orchestrator: The primary coordinator agent that understands user intent, executes tools directly, and delegates specialized tasks to other agents via structured handoff protocols.

    2. Provider Registry: Unified abstraction layer supporting multiple AI backends (Ollama for local inference, Anthropic Claude for cloud, DeepSeek for cost-effective alternatives) with per-agent model assignment.

    3. Tool Executor: Secure execution engine for 7 tool implementations with concurrency limiting (max 25 iterations), path validation, and command allowlisting.

    4. Manifest Store: YAML-based agent configuration system with fsnotify hot-reload, enabling runtime updates without service restarts.

    5. Coordination Engine: Manages agent-to-agent handoffs, execution plans with approval workflows, and async task distribution via Kafka.

    Technical Implementation

    Multi-Agent Coordination Protocol

    Agents communicate through a structured JSON protocol that supports both direct delegation and multi-step planning:

    // Handoff Intent - for direct delegation
    type HandoffIntent struct {
        Action    string            `json:"action"`    // "delegate"
        Target    string            `json:"target"`    // target agent
        Task      string            `json:"task"`      // task description
        Context   map[string]any    `json:"context"`   // shared context
        Priority  string            `json:"priority"`  // urgency level
    }
    
    // Execution Plan - for complex workflows
    type ExecutionPlan struct {
        Action           string     `json:"action"`           // "plan"
        Summary          string     `json:"summary"`          // plan overview
        Steps            []PlanStep `json:"steps"`            // ordered steps
        RequiresApproval bool       `json:"requiresApproval"` // Plan Mode trigger
    }

    The manifest system defines each agent’s capabilities, allowed handoff targets, and tool bindings:

    # configs/agents/sntr.yaml
    apiVersion: syntor/v1
    kind: AgentManifest
    metadata:
      name: sntr
      description: Primary orchestrator with full tool access
    spec:
      model: mistral:7b
      capabilities:
        - code-analysis
        - file-operations
        - task-coordination
      tools:
        - read_file
        - write_file
        - edit_file
        - bash
        - glob
        - grep
      handoffs:
        allowedTargets:
          - documentation
          - git
          - worker
        protocol: structured

    Secure Tool Execution

    The tool system implements defense-in-depth security controls:

    // Security Manager enforces policies
    type SecurityManager struct {
        allowedPaths    []string
        blockedPaths    []string
        allowedCommands []string
        maxIterations   int // Default: 25
    }
    
    // Path validation prevents traversal attacks
    func (s *SecurityManager) ValidatePath(path string) error {
        absPath, _ := filepath.Abs(path)
    
        for _, blocked := range s.blockedPaths {
            if strings.HasPrefix(absPath, blocked) {
                return ErrPathBlocked
            }
        }
    
        for _, allowed := range s.allowedPaths {
            if strings.HasPrefix(absPath, allowed) {
                return nil
            }
        }
        return ErrPathNotAllowed
    }

    Key security controls:

    • Path validation: Prevents directory traversal, blocks sensitive paths (/etc, ~/.ssh)
    • Command allowlisting: Bash tool only executes pre-approved commands
    • Iteration limits: Maximum 25 tool calls per request prevents infinite loops
    • Plan Mode: High-risk operations require explicit user approval via Ctrl+Y/N
    • Risk classification: Tools categorized by risk level (read=low, write=medium, bash=high)

    AI Provider Abstraction

    The inference layer provides a unified interface across providers:

    type Provider interface {
        Name() string
        Chat(ctx context.Context, req ChatRequest) (*ChatResponse, error)
        ListModels(ctx context.Context) ([]Model, error)
        IsAvailable(ctx context.Context) bool
    }
    
    // Registry manages multiple providers with model routing
    type Registry struct {
        providers    map[string]Provider
        modelMapping map[string]string  // agent -> model
        fallbacks    map[string]string  // model -> fallback
    }
    
    func (r *Registry) GetProviderForModel(model string) (Provider, error) {
        // Routes to correct provider based on model prefix
        // e.g., "claude-3" -> Anthropic, "llama3.2" -> Ollama
    }

    This enables:

    • Local-first operation: Works offline with Ollama models
    • Graceful degradation: Falls back to alternative models/providers
    • Per-agent optimization: Assign specialized models to specific agents
    • Auto-pull: Automatically downloads missing Ollama models

    Enterprise Observability

    Full observability stack integrated throughout:

    // Prometheus metrics for tool execution
    var (
        toolExecutions = prometheus.NewCounterVec(
            prometheus.CounterOpts{
                Name: "syntor_tool_executions_total",
                Help: "Total tool executions by tool and status",
            },
            []string{"tool", "status"},
        )
    
        toolLatency = prometheus.NewHistogramVec(
            prometheus.HistogramOpts{
                Name:    "syntor_tool_latency_seconds",
                Help:    "Tool execution latency",
                Buckets: prometheus.DefBuckets,
            },
            []string{"tool"},
        )
    )
    
    // Jaeger tracing for distributed operations
    func (e *Executor) ExecuteTool(ctx context.Context, call ToolCall) (*ToolResult, error) {
        span, ctx := opentracing.StartSpanFromContext(ctx, "tool.execute")
        defer span.Finish()
    
        span.SetTag("tool.name", call.Name)
        span.SetTag("tool.id", call.ID)
        // ...
    }

    Infrastructure includes:

    • Prometheus: Metrics on tool execution, inference latency, agent activity
    • Grafana: Pre-built dashboards for system monitoring
    • Jaeger: Distributed tracing across agent handoffs
    • Structured logging: Uber Zap with correlation IDs

    Resilience Patterns

    Production-ready fault tolerance:

    // Circuit breaker for provider failures
    type CircuitBreaker struct {
        failures    int
        threshold   int
        state       State  // Closed, Open, HalfOpen
        lastFailure time.Time
        timeout     time.Duration
    }
    
    // Retry with exponential backoff
    type RetryConfig struct {
        MaxAttempts int
        InitialWait time.Duration
        MaxWait     time.Duration
        Multiplier  float64
    }
    
    // Rate limiter for API providers
    type RateLimiter struct {
        limiter *rate.Limiter
        burst   int
    }

    Challenges & Solutions

    Challenge 1: Hot-Reload Without Race Conditions

    Problem: Agent manifests needed runtime updates without service restarts, but concurrent access during reload could cause inconsistent reads or panics.

    Solution: Implemented a copy-on-write pattern with RWMutex protection. The ManifestStore maintains an immutable snapshot that readers access lock-free, while writes create a new snapshot under exclusive lock:

    func (s *ManifestStore) Reload() error {
        s.mu.Lock()
        defer s.mu.Unlock()
    
        newManifests := make(map[string]*AgentManifest)
        // Load all manifests into new map
    
        s.manifests = newManifests  // Atomic swap
        return nil
    }

    Lesson Learned: Copy-on-write patterns trade memory for simplicity and are often the right choice for configuration systems where reads vastly outnumber writes.

    Challenge 2: LLM Output Parsing Reliability

    Problem: Language models produce inconsistent JSON—sometimes wrapped in markdown code blocks, sometimes with trailing commas, sometimes with explanatory text mixed in.

    Solution: Built a robust multi-stage parser that handles common variations:

    func ParseToolCalls(output string) ([]ToolCall, error) {
        // Stage 1: Extract JSON from markdown blocks
        output = extractFromCodeBlock(output)
    
        // Stage 2: Try strict JSON parse
        if calls, err := strictParse(output); err == nil {
            return calls, nil
        }
    
        // Stage 3: Lenient parse with fixes
        output = fixTrailingCommas(output)
        output = fixUnquotedKeys(output)
    
        return lenientParse(output)
    }

    Lesson Learned: When integrating with LLMs, assume the worst about output format consistency. Build parsers that degrade gracefully rather than fail on edge cases.

    Challenge 3: TUI State Management Complexity

    Problem: Bubbletea’s Elm-architecture requires careful state management. With streaming responses, mode toggles, and approval workflows, state transitions became error-prone.

    Solution: Centralized state into a well-defined model with explicit state machine transitions:

    type Model struct {
        state       AppState  // Idle, Streaming, AwaitingApproval, Error
        mode        Mode      // Auto, Plan
        pendingPlan *ExecutionPlan
        // ...
    }
    
    func (m Model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
        switch m.state {
        case StateAwaitingApproval:
            return m.handleApprovalInput(msg)
        case StateStreaming:
            return m.handleStreamingUpdate(msg)
        // ...
        }
    }

    Lesson Learned: Explicit state machines make complex UI logic tractable. Name your states and transitions clearly—it’s documentation that the compiler enforces.

    Results & Metrics

    Project Scale

    • ~34,000 lines of Go code across well-organized packages
    • 17 test files covering integration, property-based, and unit tests
    • 5 specialized agents with distinct capabilities
    • 7 secure tool implementations
    • 3 AI provider integrations
    • 8 completed development phases

    Architecture Quality

    • Clean separation between inference, tools, coordination, and UI layers
    • Hot-reload capability enables zero-downtime configuration updates
    • Full observability stack (metrics, tracing, logging) from day one
    • Resilience patterns (circuit breaker, retry, rate limiting) built-in

    Flexibility

    • Works offline with local Ollama models
    • Scales to cloud providers (Anthropic, DeepSeek) without code changes
    • Per-agent model assignment optimizes cost/capability tradeoffs
    • YAML manifests enable non-developer customization

    Key Takeaways

    1. Abstraction layers pay dividends: The provider registry made adding new AI backends trivial. What started as Ollama-only became multi-provider with minimal changes because the abstraction was right.

    2. Security must be foundational: Bolt-on security never works. Building path validation, command allowlisting, and approval workflows from the start shaped the architecture positively.

    3. Observability is not optional: Having Prometheus metrics and Jaeger tracing from the beginning caught bugs that would have been invisible in production. The investment pays for itself quickly.

    4. Hot-reload changes development velocity: Being able to tweak agent prompts and capabilities without restart cycles accelerated iteration dramatically during prompt engineering.

    5. Elm architecture scales: Bubbletea’s update/view pattern handled complex TUI state better than expected. The discipline of explicit state machines prevented entire categories of bugs.

    Future Enhancements

    • Persistent conversation memory: Long-term context storage with semantic search
    • Custom tool plugins: User-defined tools via WASM or gRPC
    • Web UI option: Browser-based interface alongside TUI
    • Agent marketplace: Share and discover community agent manifests
    • Cloud deployment templates: ECS/EKS configurations for team deployments
    • MCP integration: Model Context Protocol support for broader tool ecosystem

    Resources


    This project demonstrates proficiency in: Go systems programming, multi-agent AI architecture, secure tool execution design, distributed systems patterns, terminal UI development, and enterprise observability implementation.