Event-driven architecture and AI-enabled workflows for teams building production software.

Mark Holton

I'm Mark Holton, a software architect with 25+ years of experience designing and operating production systems.

I help teams design, build, and stabilize systems that must work reliably under real-world constraints.

Focused on distributed systems, event-driven platforms, and AI workflows moving from prototype to production.

Trusted by teams building production systems

"Mark quickly got up to speed on a complex system involving AI pipelines and data workflows, and provided clear, actionable guidance on architecture and reliability.

What stood out most was his ability to cut through ambiguity and identify the real risks early.

His input gave us clarity and confidence moving forward. I'd absolutely recommend him to any team building data platforms, AI-driven products, or distributed systems."

— Eric Camastro

Founder, Pharmacast-AI

I work with growing teams navigating distributed systems complexity, reliability issues, or architecture decisions.

Engagements can be advisory, hands-on, or a mix of both. Most begin with an architecture review or stabilization sprint.

Teams usually reach out when:

  • A system is being designed from scratch, and early architectural decisions will determine how it scales
  • AI systems work in demos, but become unreliable or unpredictable in production
  • LLM workflows are slow, expensive, or difficult to control
  • Multi-step or agent-based systems have become difficult to reason about
  • Event-driven systems have grown complex and difficult to debug
  • Incidents are recurring and root causes are unclear
  • The team is preparing for scale, diligence, or external scrutiny
  • There's a need to turn a promising prototype into a production-grade system

System Architecture & Reliability

Designing systems that remain understandable as they scale.

  • Event-driven and asynchronous architectures (Kafka, Redis Streams)
  • Distributed systems boundaries and service design
  • Observability and operational visibility
  • Architecture reviews and system redesigns

Architecture Review & Stabilization Sprint

A focused engagement to diagnose issues, reduce complexity, and create a clear path to a stable, production-ready system.

Most teams don't need more code — they need clarity on where systems are breaking down and how to fix them without making things worse.

  • Diagnose recurring incidents, bottlenecks, and failure patterns
  • Identify architectural complexity and unclear system boundaries
  • Trace critical workflows (including AI/LLM pipelines) end-to-end
  • Evaluate reliability, observability, and failure handling
  • Simplify service interactions and reduce unnecessary coupling
  • Provide a clear, actionable architecture roadmap

Typical engagement: 1–2 weeks — mix of system review, working sessions, and targeted analysis

AI Systems & Workflow Architecture

Moving AI systems from demos to reliable, production-grade workflows.

  • Multi-agent orchestration and workflow design
  • Tool-integrated LLM systems (APIs, structured tool use)
  • Evaluation loops and system quality measurement
  • Workflow orchestration across LLM and deterministic systems
  • Durable workflows using Temporal
  • Production-grade error handling and observability

Representative Systems & Engagements

A selection of systems and engagements spanning large-scale event-driven platforms, AI workflow design, and architecture review.

Event-Driven Platform for Conversational Systems (Salesforce)

Architecture for an event-driven platform processing billions of events per month, supporting operational workflows, large-scale analytics, and real-time system coordination.

Agentic Go-to-Market Workflow System (ShiftUp)

Architected and built a multi-stage AI-driven workflow system supporting stakeholder discovery and go-to-market execution.

  • Multi-step pipeline coordinating research, synthesis, and structured outputs
  • Tool-integrated LLM system design enabling repeatable, production-oriented workflows

Agentic AI System Architecture Review (PharmaCast AI)

Led an architectural review of a multi-step AI system, identifying risks in orchestration, reliability, and evaluation as the system moved toward production.

Background

I've spent more than 25 years building production software systems, including over a decade as a software architect at Salesforce working on event-driven systems at scale.

Today I run NoraFoundry, an independent architecture practice focused on helping teams design and evolve complex software systems.

If this sounds familiar, I'd be glad to help.

Most teams I work with start with a short conversation to walk through their system, identify where complexity or risk is building up, and determine whether a focused architecture review or stabilization sprint makes sense.

If you're dealing with reliability issues, growing system complexity, or trying to move an AI system from prototype to production, it may be worth a conversation.

You can reach me directly or send a brief note about your system. Happy to take a quick look.

[email protected]