Event-driven architecture and AI-enabled workflows for teams building production software.

I'm Mark Holton, a software architect with 25+ years of experience designing and operating production systems.

I help teams design, build, and stabilize systems that must work reliably under real-world constraints.

Focused on distributed systems, event-driven platforms, and AI workflows moving from prototype to production.

Trusted by teams building production systems

"Mark quickly got up to speed on a complex system involving AI pipelines and data workflows, and provided clear, actionable guidance on architecture and reliability.

What stood out most was his ability to cut through ambiguity and identify the real risks early.

His input gave us clarity and confidence moving forward. I'd absolutely recommend him to any team building data platforms, AI-driven products, or distributed systems."

— Eric Camastro

Founder, Pharmacast-AI

I work with growing teams navigating distributed systems complexity, reliability issues, or architecture decisions.

Engagements can be advisory, hands-on, or a mix of both. Most begin with an architecture review or stabilization sprint.

Experience building and architecting systems

Large enterprises

Startups

Teams usually reach out when:

A system is being designed from scratch, and early architectural decisions will determine how it scales
AI systems work in demos, but become unreliable or unpredictable in production
LLM workflows are slow, expensive, or difficult to control
Multi-step or agent-based systems have become difficult to reason about
Event-driven systems have grown complex and difficult to debug
Incidents are recurring and root causes are unclear
The team is preparing for scale, diligence, or external scrutiny
There's a need to turn a promising prototype into a production-grade system

System Architecture & Reliability

Designing systems that remain understandable as they scale.

Event-driven and asynchronous architectures (Kafka, Redis Streams)
Distributed systems boundaries and service design
Observability and operational visibility
Architecture reviews and system redesigns

Architecture Review & Stabilization Sprint

A focused engagement to diagnose issues, reduce complexity, and create a clear path to a stable, production-ready system.

Most teams don't need more code — they need clarity on where systems are breaking down and how to fix them without making things worse.

Diagnose recurring incidents, bottlenecks, and failure patterns
Identify architectural complexity and unclear system boundaries
Trace critical workflows (including AI/LLM pipelines) end-to-end
Evaluate reliability, observability, and failure handling
Simplify service interactions and reduce unnecessary coupling
Provide a clear, actionable architecture roadmap

Typical engagement: 1–2 weeks — mix of system review, working sessions, and targeted analysis

AI Systems & Workflow Architecture

Moving AI systems from demos to reliable, production-grade workflows.

Multi-agent orchestration and workflow design
Tool-integrated LLM systems (APIs, structured tool use)
Evaluation loops and system quality measurement
Workflow orchestration across LLM and deterministic systems
Durable workflows using Temporal
Production-grade error handling and observability

Representative Systems & Engagements

A selection of systems and engagements spanning large-scale event-driven platforms, AI workflow design, and architecture review.

Event-Driven Platform for Conversational Systems (Salesforce)

Architecture for an event-driven platform processing billions of events per month, supporting operational workflows, large-scale analytics, and real-time system coordination.

Agentic Go-to-Market Workflow System (ShiftUp)

Architected and built a multi-stage AI-driven workflow system supporting stakeholder discovery and go-to-market execution.

Multi-step pipeline coordinating research, synthesis, and structured outputs
Tool-integrated LLM system design enabling repeatable, production-oriented workflows

Agentic AI System Architecture Review (PharmaCast AI)

Led an architectural review of a multi-step AI system, identifying risks in orchestration, reliability, and evaluation as the system moved toward production.

Background

I've spent more than 25 years building production software systems, including over a decade as a software architect at Salesforce working on event-driven systems at scale.

Today I run NoraFoundry, an independent architecture practice focused on helping teams design and evolve complex software systems.

If this sounds familiar, I'd be glad to help.

Most teams I work with start with a short conversation to walk through their system, identify where complexity or risk is building up, and determine whether a focused architecture review or stabilization sprint makes sense.

If you're dealing with reliability issues, growing system complexity, or trying to move an AI system from prototype to production, it may be worth a conversation.

You can reach me directly or send a brief note about your system. Happy to take a quick look.

[email protected]

Navigation