In this article, you will learn how to distinguish agentic workflows from autonomous agents by focusing on who owns control flow — a human writing code in advance, or a model reasoning at runtime.
Topics we will cover include:
Why the real axis separating these systems is predictability versus autonomy, not whether an LLM is involved.
How deterministic workflows, orchestrated workflows, reactive agents, and autonomous multi-agent systems differ, with runnable code that makes the control-flow distinction concrete.
Why workflows, not fully autonomous agents, dominate production today, and why hybrid architectures are the pattern that holds up.
Introduction
Deloitte projects that by 2027, up to 50% of companies using generative AI will have launched agentic AI pilots or proofs of concept. That’s a wave of adoption big enough that the word “agentic” has started covering almost anything with an LLM call in it, from a fixed five-step pipeline where step three happens to call GPT for a summary to a fully self-directing system that plans its own path with no script at all.
Those are not the same thing. Treating them as interchangeable leads to one of two mistakes: over-engineering a simple, well-understood task with unnecessary autonomy, or under-engineering a genuinely open-ended problem by forcing it into a rigid pipeline that breaks the moment reality deviates from the plan.
Anthropic draws the foundational line in their widely cited “Building Effective Agents” piece: workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents are systems where LLMs dynamically direct their own process and tool usage, maintaining control over how they accomplish a task. Everything in this article is detailed underneath that one distinction.
This piece maps the full spectrum of deterministic workflows, orchestrated systems, reactive single agents, and autonomous multi-agent systems, with code at each stage that makes the control-flow difference concrete rather than abstract. The code here illustrates architecture, not a deployable system; the point of each snippet is to show who decides what happens next, not to ship a feature.
The Real Axis Isn’t “AI vs. No AI”: It’s Predictability vs. Autonomy
Before comparing architectures, it’s worth replacing the wrong question. The question isn’t “does this system use an LLM.” Almost everything does now. The two questions that actually matter, borrowing a framing that’s gained real traction in architecture circles, are: does this process need to be repeatable, auditable, and explainable step-by-step? And: is the correct path even known in advance, or does the system need to discover it at runtime?
A system can lean heavily on an LLM and still be fully deterministic in structure — a fixed pipeline where one step happens to call a model for text generation, but the next step is hardcoded regardless of what comes back. A system can also be “agentic” with very little real autonomy: a tightly scripted loop with only two allowed actions and a hard step limit. The presence of an LLM call is not the signal. Ownership of control flow is.
Google Cloud’s own design-pattern documentation draws this exact line operationally: deterministic workflows include tasks with a clearly defined path known in advance, where the steps don’t change much from one run to the next. Workflows that require dynamic orchestration involve problems where the agent must determine the best way to proceed, without a predefined script. That’s the spectrum this article walks through, one stage at a time.
Deterministic Workflows
This is the baseline. A deterministic workflow has a known sequence of steps decided at design time, by a human, in code. An LLM can sit inside any step — generating text, classifying input, drafting a summary — but it does not choose what happens after its own step runs. The orchestrating code does that, regardless of what the model returns.
# deterministic_pipeline.py
# Prerequisites: none beyond Python’s standard library
# Run: python deterministic_pipeline.py
def mock_llm_classify(text: str) -> str:
“””
Mock LLM call — stands in for a real API call to keep this example
runnable without an API key. The point is structural: whatever this
returns, the NEXT function that runs is already decided below.
“””
if “refund” in text.lower() or “charge” in text.lower():
return “billing”
return “general”
def extract(raw_input: str) -> str:
“””Step 1 — always runs, always leads to step 2. No branching here.”””
return raw_input.strip()
def classify(cleaned_text: str) -> str:
“””
Step 2 — calls an LLM to produce a label, but the label has no effect
on which function runs next. That’s the deterministic part: the model
fills in a piece of data, it doesn’t influence the route.
“””
label = mock_llm_classify(cleaned_text)
print(f” (classify) LLM returned label=”{label}” (informational only)”)
return cleaned_text
def summarize(cleaned_text: str) -> str:
“””Step 3 — always runs after step 2, regardless of the label from step 2.”””
return f”Summary: {cleaned_text(:40)}…”
def notify(summary: str) -> str:
“””Step 4 — always runs last. The path is fixed at design time.”””
return f”Notification sent: {summary}”
def run_deterministic_pipeline(raw_input: str) -> str:
“””
The control flow here is written entirely by a human, in advance.
Every run takes the identical path: extract -> classify -> summarize -> notify.
The LLM call inside classify() produces a label, but that label is never
used to decide what function runs next — it’s data flowing through a fixed pipe.
“””
step1 = extract(raw_input)
step2 = classify(step1)
step3 = summarize(step2)
step4 = notify(step3)
return step4
if __name__ == “__main__”:
# Two inputs that the LLM would classify completely differently
result_1 = run_deterministic_pipeline(“I want a refund for my last charge”)
result_2 = run_deterministic_pipeline(“What are your business hours?”)
print(f”\nResult 1: {result_1}”)
print(f”Result 2: {result_2}”)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# deterministic_pipeline.py
# Prerequisites: none beyond Python’s standard library
# Run: python deterministic_pipeline.py
def mock_llm_classify(text: str) -> str:
“””
Mock LLM call — stands in for a real API call to keep this example
runnable without an API key. The point is structural: whatever this
returns, the NEXT function that runs is already decided below.
“””
if “refund” in text.lower() or “charge” in text.lower():
return “billing”
return “general”
def extract(raw_input: str) -> str:
“””Step 1 — always runs, always leads to step 2. No branching here.”””
return raw_input.strip()
def classify(cleaned_text: str) -> str:
“””
Step 2 — calls an LLM to produce a label, but the label has no effect
on which function runs next. That’s the deterministic part: the model
fills in a piece of data, it doesn’t influence the route.
“””
label = mock_llm_classify(cleaned_text)
print(f” (classify) LLM returned label=”{label}” (informational only)”)
return cleaned_text
def summarize(cleaned_text: str) -> str:
“””Step 3 — always runs after step 2, regardless of the label from step 2.”””
return f”Summary: {cleaned_text(:40)}…”
def notify(summary: str) -> str:
“””Step 4 — always runs last. The path is fixed at design time.”””
return f”Notification sent: {summary}”
def run_deterministic_pipeline(raw_input: str) -> str:
“””
The control flow here is written entirely by a human, in advance.
Every run takes the identical path: extract -> classify -> summarize -> notify.
The LLM call inside classify() produces a label, but that label is never
used to decide what function runs next — it’s data flowing through a fixed pipe.
“””
step1 = extract(raw_input)
step2 = classify(step1)
step3 = summarize(step2)
step4 = notify(step3)
return step4
if __name__ == “__main__”:
# Two inputs that the LLM would classify completely differently
result_1 = run_deterministic_pipeline(“I want a refund for my last charge”)
result_2 = run_deterministic_pipeline(“What are your business hours?”)
print(f”\nResult 1: {result_1}”)
print(f”Result 2: {result_2}”)
How to run: python deterministic_pipeline.py, no dependencies required.
Output:
(classify) LLM returned label=”billing” (informational only)
(classify) LLM returned label=”general” (informational only)
Result 1: Notification sent: Summary: I want a refund for my last charge…
Result 2: Notification sent: Summary: What are your business hours?…
(classify) LLM returned label=’billing’ (informational only)
(classify) LLM returned label=’general’ (informational only)
Result 1: Notification sent: Summary: I want a refund for my last charge…
Result 2: Notification sent: Summary: What are your business hours?…
Notice what happened: the mock LLM classified the two inputs completely differently, billing versus general, and it made zero difference to the path either input took. Both went through the exact same four functions in the same order. That’s the entire definition of deterministic: the route is fixed, even when an LLM is doing real work inside one of the steps.
Orchestrated Workflows
This is the middle ground that gets mislabeled most often as “agentic,” and it’s worth slowing down here because it’s the line most people actually cross when they start using that word loosely.
An orchestrated workflow still has a graph of possible paths defined entirely in advance, but which path gets taken now depends on a runtime decision, frequently made by an LLM call. This is still a workflow. Every branch that could be taken was anticipated and written into code by a human before the system ever ran. The LLM picks a branch off a menu someone else wrote. It does not invent a new item on that menu.
This is precisely the “dynamic orchestration” category Google Cloud separates from genuine agents — the system needs to plan and route, but inside a structure that a human still fully designed.
# orchestrated_pipeline.py
# Prerequisites: none beyond Python’s standard library
# Run: python orchestrated_pipeline.py
def mock_llm_classify(text: str) -> str:
“””Mock LLM classification call.”””
text_lower = text.lower()
if “refund” in text_lower or “charge” in text_lower:
return “billing”
if “crash” in text_lower or “error” in text_lower or “bug” in text_lower:
return “technical”
return “general”
def extract(raw_input: str) -> str:
return raw_input.strip()
# Three pre-defined downstream handlers. A human wrote all three of these
# in advance. The LLM does not invent a fourth path — it can only select
# among branches that already exist in this code.
def handle_billing(text: str) -> str:
return f”(BILLING TEAM) Routed: {text(:50)}”
def handle_technical(text: str) -> str:
return f”(TECH SUPPORT) Routed: {text(:50)}”
def handle_general(text: str) -> str:
return f”(GENERAL QUEUE) Routed: {text(:50)}”
# The branch map IS the entire decision space. Every key here was written
# by a human ahead of time. The LLM’s job is to pick a key — not define one.
ROUTE_MAP = {
“billing”: handle_billing,
“technical”: handle_technical,
“general”: handle_general,
}
def run_orchestrated_pipeline(raw_input: str) -> str:
“””
Still a workflow, not an agent: every possible path was anticipated
and coded by a human ahead of time, sitting in ROUTE_MAP. The LLM call
decides WHICH pre-built branch executes for this specific input, but
it cannot invent a branch that isn’t already a key in ROUTE_MAP.
“””
cleaned = extract(raw_input)
label = mock_llm_classify(cleaned)
print(f” (route) LLM classified as ‘{label}’ -> dispatching to handle_{label}()”)
handler = ROUTE_MAP.get(label, handle_general)
return handler(cleaned)
if __name__ == “__main__”:
test_inputs = (
“I was charged twice for my refund request”,
“The app keeps crashing with an error on startup”,
“What are your business hours?”,
)
for inp in test_inputs:
result = run_orchestrated_pipeline(inp)
print(f” Result: {result}\n”)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# orchestrated_pipeline.py
# Prerequisites: none beyond Python’s standard library
# Run: python orchestrated_pipeline.py
def mock_llm_classify(text: str) -> str:
“””Mock LLM classification call.”””
text_lower = text.lower()
if “refund” in text_lower or “charge” in text_lower:
return “billing”
if “crash” in text_lower or “error” in text_lower or “bug” in text_lower:
return “technical”
return “general”
def extract(raw_input: str) -> str:
return raw_input.strip()
# Three pre-defined downstream handlers. A human wrote all three of these
# in advance. The LLM does not invent a fourth path — it can only select
# among branches that already exist in this code.
def handle_billing(text: str) -> str:
return f”(BILLING TEAM) Routed: {text(:50)}”
def handle_technical(text: str) -> str:
return f”(TECH SUPPORT) Routed: {text(:50)}”
def handle_general(text: str) -> str:
return f”(GENERAL QUEUE) Routed: {text(:50)}”
# The branch map IS the entire decision space. Every key here was written
# by a human ahead of time. The LLM’s job is to pick a key — not define one.
ROUTE_MAP = {
“billing”: handle_billing,
“technical”: handle_technical,
“general”: handle_general,
}
def run_orchestrated_pipeline(raw_input: str) -> str:
“””
Still a workflow, not an agent: every possible path was anticipated
and coded by a human ahead of time, sitting in ROUTE_MAP. The LLM call
decides WHICH pre-built branch executes for this specific input, but
it cannot invent a branch that isn’t already a key in ROUTE_MAP.
“””
cleaned = extract(raw_input)
label = mock_llm_classify(cleaned)
print(f” (route) LLM classified as ‘{label}’ -> dispatching to handle_{label}()”)
handler = ROUTE_MAP.get(label, handle_general)
return handler(cleaned)
if __name__ == “__main__”:
test_inputs = (
“I was charged twice for my refund request”,
“The app keeps crashing with an error on startup”,
“What are your business hours?”,
)
for inp in test_inputs:
result = run_orchestrated_pipeline(inp)
print(f” Result: {result}\n”)
How to run: python orchestrated_pipeline.py, no dependencies required.
Output:
(route) LLM classified as ‘billing’ -> dispatching to handle_billing()
Result: (BILLING TEAM) Routed: I was charged twice for my refund request
(route) LLM classified as ‘technical’ -> dispatching to handle_technical()
Result: (TECH SUPPORT) Routed: The app keeps crashing with an error on startup
(route) LLM classified as ‘general’ -> dispatching to handle_general()
Result: (GENERAL QUEUE) Routed: What are your business hours?
(route) LLM classified as ‘billing’ -> dispatching to handle_billing()
Result: (BILLING TEAM) Routed: I was charged twice for my refund request
(route) LLM classified as ‘technical’ -> dispatching to handle_technical()
Result: (TECH SUPPORT) Routed: The app keeps crashing with an error on startup
(route) LLM classified as ‘general’ -> dispatching to handle_general()
Result: (GENERAL QUEUE) Routed: What are your business hours?
Three different inputs took three different paths this time — that’s new compared to the previous section. But look at ROUTE_MAP: every possible destination was already written into the code before any of these inputs arrived. The LLM exercised judgment about which key to use. It never had the option to create a key that wasn’t there. That distinction — a fixed set of possible paths versus a path that gets invented at runtime — is exactly where the next section picks up.
Reactive Agents: The ReAct Loop and a Genuinely Open Path
This is where real autonomy starts. The ReAct pattern — Reasoning plus Acting, introduced by Yao et al. in 2022 — lets the model itself decide, at each step, what action to take next based on what it observed from the previous action. There is no pre-written branch covering every case. The agent operates in an iterative loop of thought, action, and observation until an exit condition is met, and the sequence itself — how many steps, in what order, and which tools — is not knowable in advance. Only the available actions are fixed; the path through them is not.
This is the architectural threshold the previous two sections were building toward. In the orchestrated workflow, a human wrote every possible branch into ROUTE_MAP before the system ran. Here, the model decides both the path and the sequence length at runtime, even though the toolset itself is still fixed.
# react_loop.py
# Prerequisites: none beyond Python’s standard library
# Run: python react_loop.py
def search_knowledge_base(query: str) -> str:
“””A tool the agent can call. Whether and when it gets called is not
decided here — it’s decided by the model, at runtime.”””
mock_kb = {
“refund policy”: “Refunds are available within 30 days of purchase.”,
“shipping time”: “Standard shipping takes 5-7 business days.”,
}
for key, value in mock_kb.items():
if key in query.lower():
return value
return “No matching information found in knowledge base.”
def escalate_to_human(reason: str) -> str:
“””A second tool the agent can call — again, the decision to call this
instead of the search tool is made by the model, not by this code.”””
return f”Escalated to human agent. Reason: {reason}”
AVAILABLE_TOOLS = {
“search_knowledge_base”: search_knowledge_base,
“escalate_to_human”: escalate_to_human,
}
def mock_llm_decide_next_step(observations: list(str), user_query: str) -> dict:
“””
Mock LLM call standing in for the REASONING step of ReAct.
In a real system, this is an actual model call that reads the full
Thought -> Action -> Observation history and decides what happens next.
Critically: this function — not the calling loop below — decides which
tool to call and when to stop. There is no “if query contains X, call Y”
branch written anywhere in run_react_loop(). The decision is made fresh,
from accumulated context, on every single iteration.
“””
if not observations:
return {
“thought”: “I need to look up the policy before I can answer.”,
“action”: “search_knowledge_base”,
“action_input”: user_query,
}
last_observation = observations(-1)
if “No matching information” in last_observation:
# This branch was never written by a human in advance — the model
# decided, based on what it just observed, that escalation was needed.
return {
“thought”: “The knowledge base has no answer. I should escalate this.”,
“action”: “escalate_to_human”,
“action_input”: “No KB match for: ” + user_query,
}
return {
“thought”: “I found the answer. Task complete.”,
“action”: “finish”,
“action_input”: last_observation,
}
def run_react_loop(user_query: str, max_steps: int = 5) -> str:
“””
Thought -> Action -> Observation, repeated until the model itself decides
to stop. Compare this directly against run_orchestrated_pipeline() in the
previous section: there is no ROUTE_MAP here. There is no human-written
branch saying “if X happened, call Y.” Every decision about what happens
next is made by the model, at runtime, based on what it has observed so far.
“””
observations: list(str) = ()
for step in range(max_steps):
decision = mock_llm_decide_next_step(observations, user_query)
print(f” Step {step + 1} — Thought: {decision(‘thought’)}”)
if decision(“action”) == “finish”:
return f”Final answer: {decision(‘action_input’)}”
tool_fn = AVAILABLE_TOOLS.get(decision(“action”))
if tool_fn is None:
return f”Error: model requested unknown tool ‘{decision(‘action’)}'”
observation = tool_fn(decision(“action_input”))
print(f” Step {step + 1} — Action: {decision(‘action’)}({decision(‘action_input’)!r})”)
print(f” Step {step + 1} — Observation: {observation}\n”)
observations.append(observation)
return “Max steps reached without resolution.”
if __name__ == “__main__”:
print(“=== Query A: answerable from the knowledge base ===”)
result_a = run_react_loop(“What is the refund policy?”)
print(f”Result: {result_a}\n”)
print(“=== Query B: not in the knowledge base, should trigger escalation ===”)
result_b = run_react_loop(“Can you process my international tax refund in crypto?”)
print(f”Result: {result_b}”)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# react_loop.py
# Prerequisites: none beyond Python’s standard library
# Run: python react_loop.py
def search_knowledge_base(query: str) -> str:
“””A tool the agent can call. Whether and when it gets called is not
decided here — it’s decided by the model, at runtime.”””
mock_kb = {
“refund policy”: “Refunds are available within 30 days of purchase.”,
“shipping time”: “Standard shipping takes 5-7 business days.”,
}
for key, value in mock_kb.items():
if key in query.lower():
return value
return “No matching information found in knowledge base.”
def escalate_to_human(reason: str) -> str:
“””A second tool the agent can call — again, the decision to call this
instead of the search tool is made by the model, not by this code.”””
return f”Escalated to human agent. Reason: {reason}”
AVAILABLE_TOOLS = {
“search_knowledge_base”: search_knowledge_base,
“escalate_to_human”: escalate_to_human,
}
def mock_llm_decide_next_step(observations: list(str), user_query: str) -> dict:
“””
Mock LLM call standing in for the REASONING step of ReAct.
In a real system, this is an actual model call that reads the full
Thought -> Action -> Observation history and decides what happens next.
Critically: this function — not the calling loop below — decides which
tool to call and when to stop. There is no “if query contains X, call Y”
branch written anywhere in run_react_loop(). The decision is made fresh,
from accumulated context, on every single iteration.
“””
if not observations:
return {
“thought”: “I need to look up the policy before I can answer.”,
“action”: “search_knowledge_base”,
“action_input”: user_query,
}
last_observation = observations(-1)
if “No matching information” in last_observation:
# This branch was never written by a human in advance — the model
# decided, based on what it just observed, that escalation was needed.
return {
“thought”: “The knowledge base has no answer. I should escalate this.”,
“action”: “escalate_to_human”,
“action_input”: “No KB match for: ” + user_query,
}
return {
“thought”: “I found the answer. Task complete.”,
“action”: “finish”,
“action_input”: last_observation,
}
def run_react_loop(user_query: str, max_steps: int = 5) -> str:
“””
Thought -> Action -> Observation, repeated until the model itself decides
to stop. Compare this directly against run_orchestrated_pipeline() in the
previous section: there is no ROUTE_MAP here. There is no human-written
branch saying “if X happened, call Y.” Every decision about what happens
next is made by the model, at runtime, based on what it has observed so far.
“””
observations: list(str) = ()
for step in range(max_steps):
decision = mock_llm_decide_next_step(observations, user_query)
print(f” Step {step + 1} — Thought: {decision(‘thought’)}”)
if decision(“action”) == “finish”:
return f”Final answer: {decision(‘action_input’)}”
tool_fn = AVAILABLE_TOOLS.get(decision(“action”))
if tool_fn is None:
return f”Error: model requested unknown tool ‘{decision(‘action’)}'”
observation = tool_fn(decision(“action_input”))
print(f” Step {step + 1} — Action: {decision(‘action’)}({decision(‘action_input’)!r})”)
print(f” Step {step + 1} — Observation: {observation}\n”)
observations.append(observation)
return “Max steps reached without resolution.”
if __name__ == “__main__”:
print(“=== Query A: answerable from the knowledge base ===”)
result_a = run_react_loop(“What is the refund policy?”)
print(f”Result: {result_a}\n”)
print(“=== Query B: not in the knowledge base, should trigger escalation ===”)
result_b = run_react_loop(“Can you process my international tax refund in crypto?”)
print(f”Result: {result_b}”)
How to run: python react_loop.py, no dependencies required.
Output:
=== Query A: answerable from the knowledge base ===
Step 1 — Thought: I need to look up the policy before I can answer.
Step 1 — Action: search_knowledge_base(‘What is the refund policy?’)
Step 1 — Observation: Refunds are available within 30 days of purchase.
Step 2 — Thought: I found the answer. Task complete.
Result: Final answer: Refunds are available within 30 days of purchase.
=== Query B: not in the knowledge base, should trigger escalation ===
Step 1 — Thought: I need to look up the policy before I can answer.
Step 1 — Action: search_knowledge_base(‘Can you process my international tax refund in crypto?’)
Step 1 — Observation: No matching information found in knowledge base.
Step 2 — Thought: The knowledge base has no answer. I should escalate this.
Step 2 — Action: escalate_to_human(‘No KB match for: Can you process my international tax refund in crypto?’)
Step 2 — Observation: Escalated to human agent. Reason: No KB match for: Can you process my international tax refund in crypto?
Step 3 — Thought: I found the answer. Task complete.
Result: Final answer: Escalated to human agent. Reason: No KB match for: Can you process my international tax refund in crypto?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
=== Query A: answerable from the knowledge base ===
Step 1 — Thought: I need to look up the policy before I can answer.
Step 1 — Action: search_knowledge_base(‘What is the refund policy?’)
Step 1 — Observation: Refunds are available within 30 days of purchase.
Step 2 — Thought: I found the answer. Task complete.
Result: Final answer: Refunds are available within 30 days of purchase.
=== Query B: not in the knowledge base, should trigger escalation ===
Step 1 — Thought: I need to look up the policy before I can answer.
Step 1 — Action: search_knowledge_base(‘Can you process my international tax refund in crypto?’)
Step 1 — Observation: No matching information found in knowledge base.
Step 2 — Thought: The knowledge base has no answer. I should escalate this.
Step 2 — Action: escalate_to_human(‘No KB match for: Can you process my international tax refund in crypto?’)
Step 2 — Observation: Escalated to human agent. Reason: No KB match for: Can you process my international tax refund in crypto?
Step 3 — Thought: I found the answer. Task complete.
Result: Final answer: Escalated to human agent. Reason: No KB match for: Can you process my international tax refund in crypto?
Look at what differs between the two runs: query A finished in two steps, query B took three, and query B took an action — escalation — that was never hardcoded as “what happens when refund queries mention crypto.” The same loop, the same code, produced two genuinely different step counts and sequences because the model decided the path at runtime based on what it observed. That’s the actual, concrete meaning of “no predefined code path” — not a slogan, but a measurable difference in how many steps were run and what they were.
Production implementations of this pattern typically wrap the accumulated thought/observation history in a “scratchpad” and summarize tool outputs before feeding them back into the loop, since dumping raw error logs or large API responses back into context tends to confuse the next reasoning step rather than help it.
Autonomous Multi-Agent Systems
The far end of the spectrum builds directly on the ReAct loop above, just nested. In a multi-agent setup, an orchestrator runs its own ReAct loop, where some of its available “actions” are calls to other agents, each of which runs its own complete ReAct loop inside. The orchestrator reasons about what to delegate, delegates it, observes the result, and continues — exactly like the single-agent loop in the previous section, except some of its “tools” are entire agents rather than simple functions.
Picture the AVAILABLE_TOOLS dictionary from the previous example, except instead of search_knowledge_base and escalate_to_human, the entries are research_agent, finance_agent, and coding_agent — and calling one of them doesn’t return a simple string; it kicks off that sub-agent’s own independent Thought-Action-Observation loop, which might run for several steps before returning anything to the orchestrator. Nobody wrote down in advance which sub-agent gets called, in what order, or how many times any of them run.
Google Cloud’s documentation labels the most extreme version of this the “swarm” pattern — a collaborative team of agents with no central orchestrator at all, capable of producing exceptionally high-quality, creative solutions precisely because nothing is constraining how they interact. That same lack of structure is also the risk: without a human-designed bound on the interaction, a swarm can fall into unproductive loops or simply fail to converge, and the cost of running many agents through many turns compounds quickly.
This is the point on the spectrum where the predictability axis from the first section swings hardest in the other direction. A deterministic pipeline gives you the same output structure every time, by construction. A swarm of autonomous agents gives you the flexibility to handle a problem nobody anticipated, at the cost of being able to predict, in advance, what it will do or how long it will take to do it.
Why This Distinction Actually Matters in Production
This isn’t an academic distinction. It has a direct, measurable effect on what teams actually ship. Despite the volume of hype around autonomous agents, AI workflows — not fully autonomous agents — won the production battle in 2025: workflows remain the dominant pattern behind successful generative AI deployments, while fully autonomous multi-agent systems are still largely exploratory outside of narrow domains.
The reason maps directly back to the predictability axis from the start of this article. Agentic systems are non-deterministic by nature; identical inputs can produce different outputs across separate runs, which is a serious liability in regulated, auditable, or otherwise high-stakes processes. If a process must be explainable step by step to a compliance team or a regulator, that’s not agent territory by default; it needs guardrails and human-in-the-loop checkpoints layered on top before it can be trusted with real consequences.
The pattern that’s actually emerging in mature systems is hybrid, not a pick-one decision. A higher-level agent sets goals and orchestrates the overall task, while critical, well-understood computations still run inside deterministic modules that a human has fully specified. A medical diagnostics system, for example, might use an agent to interpret ambiguous symptoms and decide which tests to order — genuine autonomy, because the right sequence of tests isn’t knowable in advance — while each test itself runs through a validated, deterministic pipeline, because that part of the problem has a known correct path and no reason to introduce variability into it.
Conclusion
“Agentic workflow” and “autonomous agent” describe two ends of one spectrum, not two competing technologies, and the four stages walked through here — deterministic, orchestrated, reactive, and autonomous multi-agent — aren’t a ranking from worse to better. They’re different answers to the same question: who decides what happens next, and was that decision made by a human writing code in advance, or by a model reasoning at runtime?
Deterministic workflows give you auditability and repeatability by construction; the same input takes the same path every time, full stop. Reactive and multi-agent systems give up that guarantee in exchange for the ability to handle problems whose shape genuinely cannot be anticipated ahead of time. Neither property is free, and neither architecture is correct by default.
The systems that hold up well in production don’t pick one extreme of this spectrum and apply it everywhere. They place each piece of the problem at the point on the spectrum that piece actually calls for — a fixed structure wherever a known correct path exists and repeatability matters, with real autonomy reserved for the parts of the problem that have no predefined correct path to follow in the first place.



GIPHY App Key not set. Please check settings